Quick answer: Prompt engineering as a named practice emerged with GPT-3's few-shot paper (2020). Chain-of-thought (2022), RAG (2023), and tool-use/agents (2024) extended it. By 2026, reasoning models and agentic loops have shifted the centre of gravity from “clever prompts” to “system design around capable models.”

2020 — GPT-3 and the birth of few-shot

The OpenAI GPT-3 paper (Brown et al., “Language Models are Few-Shot Learners”, May 2020) showed that a large enough LM could learn tasks from a handful of examples in the prompt — no fine-tuning required. That paper is the moment “prompting” became a discipline.

2021 — Instruction tuning and T0 / FLAN

Instruction-tuned models (T0, FLAN) made simple natural-language prompts work without examples. This split prompting into “zero-shot” (rely on training) and “few-shot” (demonstrate).

2022 — Chain-of-thought and the DAN era

Wei et al. showed that asking the model to “think step by step” before answering dramatically improved reasoning. The same year ChatGPT launched (November 2022) and popular culture discovered “jailbreak” prompts (DAN, “pretend you have no filters”). Prompt engineering briefly became a meme career.

2023 — RAG takes over production

Retrieval-augmented generation (RAG) became the standard pattern for grounding LLMs on your own data. LangChain, LlamaIndex, and vector databases (Pinecone, Weaviate, pgvector) exploded. Prompt engineering fused with retrieval engineering.

2024 — Tool use, agents, and constitutional methods

OpenAI Assistants, Anthropic tool use, and the broader “agent” wave shifted prompting from single completions to multi-turn loops. Anthropic's Constitutional AI work showed that prompting models to self-critique against principles produced more aligned outputs.

2025 — Reasoning models

OpenAI o1 (late 2024), o3 (2025), DeepSeek-R1, and Claude 4 Thinking introduced models that do extensive internal reasoning before answering. “Think step by step” became an explicit budget you dial, not a phrase you write. This devalued some prompt tricks and elevated spec-writing.

2026 — Agentic loops and long-context design

Prompt engineering in 2026 is increasingly about system design: agent loops with memory and tool-use, long contexts (200k–1M tokens) where placement and pointer design matter, and orchestration across multiple models (a cheap model dispatches to a reasoning model for hard steps).

What's next

Look for: (1) prompts written by LLMs that route and rewrite themselves; (2) programmatic prompt optimisation (DSPy, TextGrad); (3) tighter integration with specification formats (JSON Schema, Pydantic); and (4) the continuing slow death of “clever prompt” tricks in favour of “well-specified problems.”

A Short History of Prompt Engineering (2020–2026)