What Is Temperature in LLMs? | GEO Glossary

Temperature is a parameter that controls how "sharp" an LLM's probability distribution is when sampling the next token. Low values bias toward the most probable tokens for consistent, predictable output; high values allow less probable tokens to be sampled, producing more creative and varied responses. Most APIs accept values from 0 to 2.

Why It Matters

The same prompt at temperature 0.2 and 1.0 produces completely different tone, length, and creativity. For AI-generated blog drafts, too low means mechanical and predictable prose; too high means rising factual errors and hallucinations. Setting temperature intentionally alongside prompt engineering is the prerequisite for stable AI content quality.

Behavior by Range

Temperature	Characteristic	Suitable tasks
0.0 – 0.2	Deterministic, reproducible	Classification, extraction, code, factual QA
0.3 – 0.5	Consistent with slight variation	Summarization, translation, structured answers
0.6 – 0.8	Natural creativity	Blog drafts, emails, marketing copy
0.9 – 1.2	Diverse, creative	Ideation, brainstorming
1.3+	Noisy, more hallucinations	Rarely used in production

Temperature vs Top-p

Another common sampling parameter is top-p (nucleus sampling), which only considers tokens whose cumulative probability reaches p.

Temperature reshapes the entire probability distribution.
Top-p limits the size of the candidate pool.
Don't tune both: OpenAI and Anthropic both recommend adjusting only one. Tuning both makes behavior unpredictable.

Recommended Values by Task

Fact-based posts (tutorials, guides): 0.2 – 0.4. Accuracy first, creativity minimal.

Blog drafts (essays, analysis): 0.6 – 0.7. Natural sentences with consistent voice.

Ideation (title variants, copy options): 0.9 – 1.0. Diversity is the point.

Summary and translation: 0.0 – 0.3. Reproducibility matters.

FAQs and definitions: 0.0 – 0.2. Same question should get the same answer.

Caveats

Hallucination correlation: Higher temperature means the model samples more tokens outside the main training distribution, raising factual error rates. For hallucination-sensitive tasks, always lower temperature.

Reproducibility: Temperature 0 is not perfectly deterministic. Pin the seed parameter as well if you need identical outputs.

Default values: Defaults differ by API (OpenAI 1.0, Anthropic 1.0, Google ~1.0). Calling without setting one yields more creative output than you might expect.

Sources: