GenAI

Context Engineering: When AI Pair-Programming Actually Feels Human

AI sidekicks shine only when you feed them real context—code, errors, docs. They speed legacy refactors and CLI tweaks but can still hallucinate bogus API flags or balloon boilerplate on green-field work. Treat them as amplifiers, keep tests tight, and reset fast when suggestions drift.

Eric Glasser

04 Jul 2025 • 3 min read

Give the AI the right context, and it codes like a tireless sidekick—without stealing the keyboard.

Large‑language‑model coding assistants aren’t magical because they “write the boring bits.” Confession: my first swing at an AI helper spat out code I wouldn’t let anywhere near production—picture a 400‑line diff peppered with stray commas and pretend AWS regions. The real unlock is when an assistant behaves like a sharp pair‑programming partner who already cloned the repo and skimmed the spec—so every suggestion lands inside the project’s reality instead of floating in autocomplete limbo. Making that happen is an act of context engineering: curating what the model sees and when it sees it.

I learned this lesson on a legacy side‑project from 2017 running Node 8, Keystone JS 4, Backbone, and jQuery—code I hadn’t touched since gulp ruled the earth. Instead of spelunking Stack Overflow, I connected a context‑aware coding assistant that could ingest my entire file tree, terminal output, and test logs in real time. Suddenly it could answer, “Where does the legacy auth check live?” without hallucinating paths that never existed. What it did was amplify the changes I already intended. I needed to wrangle some thorny user‑focus handling between a slideshow and a Bootstrap 4 accordion. I couldn’t remember the exact event syntax, so I walked the assistant through the errors. After a few iterations it surfaced the right jQuery trigger('shown.bs.collapse') hook, generated the glue code, and pasted a concise diff for review—saving me fifteen tabs of documentation hunting. In other words, it didn’t replace me—it extended my reach so I could stay focused on the bigger design moves while it handled the syntax spelunking. I still read every diff, but the flag‑hunting drudgery was gone; it felt like pair‑programming with an AI sidekick that never tires.

Agents and the MCP layer. Under the hood, that "sidekick" is really an agent—a lightweight runtime that plans a step, calls the LLM for code or commands, then acts on the result. What lifts it beyond autocomplete is the Model‑Context Protocol (MCP) layer.

Think of the MCP server as live subtitles for your repo: every time you save a file, break a test, or copy an error message, those facts scroll in front of the model so it can reason with the freshest context instead of stale guesses.

The pattern held on a second experiment that mashed up an LLM with a knowledge‑base retrieval layer. Instead of letting the model wing it, I fed it three carefully chosen ingredients—current schema, a query template, and fresh embeddings—so the output stayed grounded in real data rather than drifting into hallucination. That’s context engineering in a nutshell: give the model just enough truth to stay on the rails.

After running this dance across multiple projects, a clear pattern emerges: with an AI sidekick I move through a feature branch far quicker and with a lot less mental friction. The trade‑off is vigilance—unchecked suggestions can still sneak in fragile edge cases or security quirks, so a quick lint, test run, and eyeball review remain non‑negotiable.

Beyond codebases, the same context‑first approach shines at the CLI. I had it tighten my SSH config and convert my shell from Oh My Zsh to a lazy‑loaded Starship setup—each change proposed in small, auditable diffs.

# Example: asking a CLI assistant to unwind callback hell
ai‑dev fix --project ./ --issue "Remove callback hell in email service"

The agent slurps the stack trace, proposes async/await refactors, and links to docs—because the plugin funnels logs and file paths into its context window.

That said, it can also dig in its heels. I once asked the assistant to generate a simple putObject call to S3. It invented two fantasy parameters—no matter how many times I pasted the official docs, it insisted those flags were legit and the call kept crashing. Eventually I scrapped the whole buffer, re‑prompted from scratch, and the next attempt nailed it in one pass. Resetting the context was faster than untangling its hallucinated arguments—a reminder that sometimes the quickest fix is a clean slate.

Start a green‑field repo, though, and the dynamic flips. The assistant can scaffold an entire service, but cohesion drifts; tests lag and boilerplate balloons. I’ve caught it scaffolding eight nearly‑identical service classes before I could blink. Fundamentals—architecture, security, accessibility—still land on me.

So no, these tools aren’t autopilot. They’re what pair programming was meant to be: two brains on one problem—one tireless, the other accountable. Wire in the right context and you amplify your craft; ignore it and you’re just arguing with autocomplete.

Challenge Yourself

Level 1 – Warm‑Up: Point your assistant at a single legacy file. Ask for a refactor and review the diff line‑by‑line before committing.

Level 2 – Intermediate: Drop a stack trace into the prompt and let the assistant propose a fix. Run your test suite—and a linter—before accepting anything.

Level 3 – Boss Fight: Catch the assistant hallucinating. When it does, wipe the chat, re‑seed it with only the essentials, and time how long a clean slate takes versus patching the bad code. Post your results in the comments—I’ll share mine next week.