5 Mistakes Prompt Engineers Make in 2026 (And How to Fix Them)
Published 21 April 2026 · 9 min read
Quick answer: The five mistakes: (1) prompt bloat — stuffing instructions until the model ignores half; (2) leaky eval sets — your test prompts include the answer; (3) ignoring model differences — copying a GPT-4 prompt to Claude without rework; (4) no versioning — last week's "fix" broke production; (5) trying to fix with more few-shot when you should fine-tune. All are fixable in hours.
1. Prompt bloat
Every new edge case adds a sentence. After twelve edits the prompt is 4,000 tokens, the model follows only the first 1,000, and quality regresses. Fix: write the prompt as a structured spec (role, constraints, examples, output format) and prune every addition that doesn't measurably improve an eval metric.
2. Leaky eval sets
Your eval prompt includes phrasing that cues the answer. The model looks great on tests and fails in production. Fix: treat evals like ML test sets — independent, representative, periodically refreshed. Rotate 20% of the eval set monthly.
3. Ignoring model differences
Claude rewards structured tags and <thinking> scratchpads. GPT-5 follows numbered imperative steps. Gemini prefers clean markdown. Don't copy cross-model without re-evaluating. Track separate prompt versions per model.
4. No versioning
You edited the prompt in a config file. Nobody reviewed. Production regressed. Fix: every prompt in git with PR review; every deploy tagged with the prompt version; rollback is a config flip.
5. Few-shot when you should fine-tune
If you have >500 labelled examples and your few-shot prompt is over 8k tokens, you are paying extra inference cost and still shipping inconsistency. Fine-tune. At the right data volume, fine-tuning beats few-shot on cost, latency, and quality simultaneously.
Related reading
10 patterns · Tool buyer's guide · GeraLearn
Play the first level free.
Start playing →