
Text-to-LoRA: Instant Transformer Adaption
Key Points
- 1Text-to-LoRA (T2L) introduces a hypernetwork capable of generating task-specific LoRA adapters for large language models (LLMs) on the fly, based solely on natural language descriptions.
- 2T2L is trained by either reconstructing pre-trained LoRA instances or via supervised fine-tuning, enabling it to compress hundreds of adapters and produce new ones in a single, inexpensive forward pass.
- 3The model demonstrates performance matching task-specific LoRAs on test sets, effectively generalizes zero-shot to entirely unseen tasks, and significantly reduces compute requirements for LLM specialization.
Text-to-LoRA (T2L) is a novel hypernetwork designed to adapt large language models (LLMs) on the fly, solely based on natural language task descriptions, thereby addressing the limitations of expensive and lengthy traditional fine-tuning processes. The core hypothesis underpinning T2L is that different LoRA adapters share an underlying adaptation mechanism, allowing for their simultaneous optimization and compression into a single model, which can then generate new task-specific LoRA adapters zero-shot at inference time.
The methodology of T2L centers on its role as a hypernetwork, which generates the parameters for LoRA adapters, serving as the "base network." For each target transformer module () and layer index (), T2L predicts the low-rank matrices A and B (comprising the of LoRA) based on a vector representation of a natural language task description . Formally, the output for a task is given by , where represents the hypernetwork with parameters . The input descriptor is constructed by concatenating the embedding of the task description , the learnable embedding for the module type , and the learnable embedding for the layer index : . Here, is a vector representation of the text description (e.g., CLS token activation from a bidirectional transformer or last token activation from an LLM), and and are learnable embedding dictionaries indexed by module type and layer index, respectively. This architecture allows T2L to generate all LoRA matrices within a single forward pass by batching the input embeddings.
T2L introduces three architectural variants, imposing different output space constraints and inductive biases:
- T2L (L - Large): The largest variant. Its final linear layer directly outputs both low-rank matrices A and B simultaneously. The number of weight connections to the output head is , where is the output size of the last MLP block in the hypernetwork, is the LoRA rank, and is the input/output dimension of the adapted linear layer.
- T2L (M - Medium): A medium-sized model with a shared output layer for either the A or B matrix. The output head size is .
- T2L (S - Small): The most parameter-efficient model, with the strongest inductive biases. It outputs only one rank of a low-rank matrix (A or B) at a time. This makes its head size much smaller: , where is the dimension of the embedding.
T2L can be trained via two distinct regimes:
- LoRA Reconstruction: This method trains T2L to reconstruct pre-trained task-specific LoRAs. The objective is to minimize the difference between the generated LoRA weights and a target library of existing LoRAs . The loss function is . While effective for compressing known LoRAs, this approach may struggle with zero-shot generalization to unseen tasks, as numerically distinct but functionally similar LoRAs for related tasks might reside in different minima.
- Supervised Fine-Tuning (SFT): This regime directly optimizes T2L on fine-tuning datasets, sidestepping the need for intermediate target LoRA adapters. The training objective for T2L is , where is the supervised fine-tuning loss, is a dataset for task , and are the base LLM weights. SFT-trained T2L implicitly learns to cluster tasks, which enhances its zero-shot LoRA generation capabilities and generalization to unseen tasks.
Experiments were primarily conducted using Mistral-7B-Instruct as the base LLM, with gte-large-en-v1.5 for extracting task embeddings. All LoRA adapters were set to rank 8, targeting query and value projection modules. The SNI dataset (a subset of 479 tasks) was used for training, and evaluation was performed on 10 widely used benchmarks including Arc-challenge/easy, BoolQ, GSM8K, Hellaswag, OpenBookQA, PIQA, Winogrande, HumanEval, and MBPP.
Results show that T2L trained via reconstruction can fully recover the performance of task-specific "oracle" LoRAs for seen benchmark tasks (Table 1), and can even outperform them in some cases, suggesting that the lossy compression acts as a regularization. However, its performance drops as the reconstruction error increases with more training tasks. For zero-shot LoRA generation, SFT-trained T2L consistently improves over a multi-task LoRA baseline across benchmarks (Table 2), demonstrating its ability to generate useful LoRAs for previously unseen tasks using only natural language descriptions. The performance of T2L generally benefits from increasing the number of training tasks and scaling the computational budget proportionally (Table 3), with the larger L and M variants showing better scalability than the S variant. T2L also exhibits robustness to different task embedding models (e.g., gte-large-en-v1.5 vs. Mistral-7B-Instruct, Table 4) and varying task descriptions, performing best when task descriptions are semantically aligned with the target task, even if unseen during training (Table 5). This robustness and generalization capability highlight T2L's potential for democratizing LLM adaptation with minimal compute requirements.