Soft Prompting
A technique where learnable continuous vectors — rather than natural language words — are prepended to a model's input, steering its behaviour without modifying the model's weights.
Soft prompting is a parameter-efficient technique for adapting AI models where, instead of crafting a natural language prompt, you prepend a sequence of learned continuous vectors (soft tokens) to the model's input. These vectors are optimised through training to steer the model's behaviour towards a specific task, without modifying any of the model's original weights.
Hard prompts versus soft prompts
- Hard prompts: The natural language instructions you write when using ChatGPT or Claude. "Summarise the following document in three bullet points." These are discrete tokens from the model's vocabulary.
- Soft prompts: Learned continuous vectors that do not correspond to any real words. They exist purely in the model's embedding space and are optimised to produce the desired behaviour. You cannot read them or express them in natural language.
How soft prompting works
- Initialise a set of continuous vectors (typically 10-100 tokens long), either randomly or from existing word embeddings.
- Freeze the entire pre-trained model — no weights are modified.
- Train only the soft prompt vectors on your specific task data, adjusting them to maximise performance.
- At inference time, prepend the learned soft prompt to each input before passing it to the frozen model.
The result is that the soft prompt "conditions" the model to behave in a specific way — as if an expert had written the perfect natural language prompt, but even better because the optimisation is not constrained by real words.
Advantages
- Extreme parameter efficiency: Only the soft prompt vectors are trained — typically a few thousand parameters out of billions. This is even more efficient than LoRA.
- Multiple tasks, one model: Different soft prompts can be swapped in and out to adapt a single model to different tasks. Each soft prompt is tiny (kilobytes).
- No weight modification: The base model is completely untouched, eliminating catastrophic forgetting and simplifying deployment.
- Surprisingly effective: Despite their simplicity, soft prompts can match the performance of full fine-tuning on many tasks, especially when the base model is large enough.
Limitations
- Interpretability: Soft prompts are not human-readable. You cannot inspect them to understand what the model has learned.
- Model-specific: A soft prompt trained for one model cannot be transferred to a different model.
- Performance gap: For smaller models, soft prompting typically underperforms full fine-tuning. The technique works best with very large models.
- Training data needed: You still need labelled data for the target task, though typically less than for full fine-tuning.
Relation to other techniques
Soft prompting sits on a spectrum of parameter-efficient methods. Prompt tuning (Google, 2021) is the most well-known implementation. It is simpler than LoRA (which modifies internal layers) but potentially less flexible. In practice, LoRA has become more popular because it offers a better performance-efficiency trade-off, but soft prompting remains valuable for its extreme efficiency and simplicity.
Why This Matters
Soft prompting demonstrates that AI model behaviour can be significantly altered without changing the model itself — a powerful concept for organisations that want to customise AI capabilities without the cost or risk of modifying model weights.
Related Terms
Continue learning in Expert
This topic is covered in our lesson: Scaling AI Across the Organisation
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training →