Temperature
Temperature is a parameter that controls how "sharp" an LLM's probability distribution is when sampling the next token. Low values bias toward the most probable tokens for consistent, predictable output; high values allow less probable tokens to be sampled, producing more creative and varied responses. Most APIs accept values from 0 to 2.
Temperature is a parameter that controls how "sharp" an LLM's probability distribution is when sampling the next token. Low values bias toward the most probable tokens for consistent, predictable output; high values allow less probable tokens to be sampled, producing more creative and varied responses. Most APIs accept values from 0 to 2.
Why It Matters
The same prompt at temperature 0.2 and 1.0 produces completely different tone, length, and creativity. For AI-generated blog drafts, too low means mechanical and predictable prose; too high means rising factual errors and hallucinations. Setting temperature intentionally alongside prompt engineering is the prerequisite for stable AI content quality.
Behavior by Range
| Temperature | Characteristic | Suitable tasks |
|---|---|---|
| 0.0 – 0.2 | Deterministic, reproducible | Classification, extraction, code, factual QA |
| 0.3 – 0.5 | Consistent with slight variation | Summarization, translation, structured answers |
| 0.6 – 0.8 | Natural creativity | Blog drafts, emails, marketing copy |
| 0.9 – 1.2 | Diverse, creative | Ideation, brainstorming |
| 1.3+ | Noisy, more hallucinations | Rarely used in production |
Temperature vs Top-p
Another common sampling parameter is top-p (nucleus sampling), which only considers tokens whose cumulative probability reaches p.
- Temperature reshapes the entire probability distribution.
- Top-p limits the size of the candidate pool.
- Don't tune both: OpenAI and Anthropic both recommend adjusting only one. Tuning both makes behavior unpredictable.
Recommended Values by Task
Fact-based posts (tutorials, guides): 0.2 – 0.4. Accuracy first, creativity minimal.
Blog drafts (essays, analysis): 0.6 – 0.7. Natural sentences with consistent voice.
Ideation (title variants, copy options): 0.9 – 1.0. Diversity is the point.
Summary and translation: 0.0 – 0.3. Reproducibility matters.
FAQs and definitions: 0.0 – 0.2. Same question should get the same answer.
Caveats
Hallucination correlation: Higher temperature means the model samples more tokens outside the main training distribution, raising factual error rates. For hallucination-sensitive tasks, always lower temperature.
Reproducibility: Temperature 0 is not perfectly deterministic. Pin the seed parameter as well if you need identical outputs.
Default values: Defaults differ by API (OpenAI 1.0, Anthropic 1.0, Google ~1.0). Calling without setting one yields more creative output than you might expect.
Sources: