Core AI

Soft Prompting

Last reviewed: April 2026

A technique where learnable continuous vectors — rather than natural language words — are prepended to a model's input, steering its behaviour without modifying the model's weights.

Soft prompting is a parameter-efficient technique for adapting AI models where, instead of crafting a natural language prompt, you prepend a sequence of learned continuous vectors (soft tokens) to the model's input. These vectors are optimised through training to steer the model's behaviour towards a specific task, without modifying any of the model's original weights.

Hard prompts versus soft prompts

Hard prompts: The natural language instructions you write when using ChatGPT or Claude. "Summarise the following document in three bullet points." These are discrete tokens from the model's vocabulary.
Soft prompts: Learned continuous vectors that do not correspond to any real words. They exist purely in the model's embedding space and are optimised to produce the desired behaviour. You cannot read them or express them in natural language.

How soft prompting works

Initialise a set of continuous vectors (typically 10-100 tokens long), either randomly or from existing word embeddings.
Freeze the entire pre-trained model — no weights are modified.
Train only the soft prompt vectors on your specific task data, adjusting them to maximise performance.
At inference time, prepend the learned soft prompt to each input before passing it to the frozen model.

The result is that the soft prompt "conditions" the model to behave in a specific way — as if an expert had written the perfect natural language prompt, but even better because the optimisation is not constrained by real words.

Advantages

Extreme parameter efficiency: Only the soft prompt vectors are trained — typically a few thousand parameters out of billions. This is even more efficient than LoRA.
Multiple tasks, one model: Different soft prompts can be swapped in and out to adapt a single model to different tasks. Each soft prompt is tiny (kilobytes).
No weight modification: The base model is completely untouched, eliminating catastrophic forgetting and simplifying deployment.
Surprisingly effective: Despite their simplicity, soft prompts can match the performance of full fine-tuning on many tasks, especially when the base model is large enough.

Limitations

Interpretability: Soft prompts are not human-readable. You cannot inspect them to understand what the model has learned.
Model-specific: A soft prompt trained for one model cannot be transferred to a different model.
Performance gap: For smaller models, soft prompting typically underperforms full fine-tuning. The technique works best with very large models.
Training data needed: You still need labelled data for the target task, though typically less than for full fine-tuning.

Relation to other techniques

Soft prompting sits on a spectrum of parameter-efficient methods. Prompt tuning (Google, 2021) is the most well-known implementation. It is simpler than LoRA (which modifies internal layers) but potentially less flexible. In practice, LoRA has become more popular because it offers a better performance-efficiency trade-off, but soft prompting remains valuable for its extreme efficiency and simplicity.

Want to go deeper?

This topic is covered in our Expert level. Access all 100+ lessons free.

Why This Matters

Soft prompting demonstrates that AI model behaviour can be significantly altered without changing the model itself — a powerful concept for organisations that want to customise AI capabilities without the cost or risk of modifying model weights.

Related Terms

Prompt Engineering

The skill of writing instructions to AI that consistently produce useful, accurate, high-quality output.

Parameter-Efficient Fine-Tuning (PEFT)

A family of techniques that adapt large AI models to specific tasks by training only a small fraction of the model's parameters, dramatically reducing the cost and hardware requirements.

LoRA (Low-Rank Adaptation)

A fine-tuning technique that trains only a small number of additional parameters instead of updating the entire model, making customisation faster and cheaper.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Learn More

Continue learning in Expert

This topic is covered in our lesson: Scaling AI Across the Organisation

← Back to Glossary