How to Prompt Thinking Models
Learn how to effectively prompt thinking models like DeepSeek R1 and OpenAI o1/o3 for optimal results.
What are thinking models?
Thinking models are LLMs optimized for reasoning and problem-solving. They have built-in Chain-of-Thought capabilities, making them more effective at complex tasks. Key models include:
- DeepSeek R1
- OpenAI o1/o3
- Gemini 2.0 Flash
- LLaMA 3.1
These models handle reasoning internally, requiring simpler prompts and less explicit guidance to get optimal results.
Summary of Do’s and Don’ts
- Do use minimal prompting to let the model think independently
- Do encourage more reasoning for better performance at complex tasks
- Do use delimiters for clarity to separate distinct parts of input
- Do use ensembling for highly complex tasks requiring high accuracy
- Do avoid few-shot and CoT prompting
- Don’t use thinking models for structured outputs unless absolutely necessary
- Do avoid overloading the model with unnecessary details
1. Use Minimal Prompting
Thinking models work best when given concise, direct, and structured prompts. Too much information can actually reduce accuracy. The best approach is to state the problem clearly and let the model figure out the steps.
Good Example:
Poor Example:
Fewer instructions allow the model to engage its reasoning process naturally.
2. Encourage More Reasoning for Complex Tasks
More complex problems benefit from additional reasoning time. Thinking models use reasoning tokens, which allow them to internally process a problem before outputting an answer.
By prompting the model to take its time, you can improve the quality of the response. However, this also increases token usage, impacting cost.
Good Example:
Poor Example:
Encouraging longer reasoning helps for multi-step problems, improving accuracy significantly.
3. Avoid Few-Shot and Chain-of-Thought Prompting
Traditional few-shot (where you give examples) and Chain-of-Thought prompting strategies reduce performance for thinking models.
According to research, thinking models performed worse when given few-shot examples. This contrasts with older models, where few-shot learning improved results. Thinking models are already designed to break down problems internally, so explicit step-by-step guidance can interfere with their reasoning.
Good Example:
Poor Example:
For thinking models, zero-shot prompts worked better than few-shot prompts.
4. Use Thinking Models for Complex Multi-Step Tasks
Thinking models perform best on tasks that require five or more steps.
When solving problems with 3-5 steps, thinking models offered a slight improvement over standard models. For simpler tasks (fewer than 3 steps), performance may actually degrade compared to traditional LLMs, because they “overthink.”
If a task is highly structured or simple, a regular LLM like GPT-4 may be a better choice.
Good Example:
Poor Example:
To check how many steps a problem requires, you can prompt the web version of a reasoning model to see how many reasoning steps it takes.
5. Use Delimiters to Structure Prompts
For regular LLMs, developers typically use delimiters like triple quotation marks, XML tags, or section titles to clearly define distinct sections of the input. This makes it easier for the model to interpret the information correctly.
Thinking models, however, struggle with structured outputs but can be guided to maintain consistency. If you need a structured response (e.g., JSON, tables, fixed formats), structure your prompt carefully.
Good Example:
Poor Example:
If structured output is critical, you’re better off using a standard LLM instead of a thinking model.
6. Use Ensembling for Highly Complex Tasks
For high-stakes or complex problems, ensembling improves performance.
Ensembling involves running multiple prompts (either the same prompt multiple times or variations of the prompt) and aggregating the results. This approach increases accuracy but raises costs because multiple queries are required.
Example of Ensembling:
While ensembling boosts performance, it’s expensive and should only be used when high accuracy is critical.
Conclusion
Prompting thinking models requires a different mindset and approach compared to traditional LLMs. By following these guidelines, you can optimize your interactions with thinking models and get the best possible responses.
Was this page helpful?