Night Shift — Pareto Blog

Charlotte Relyea and Martin Harrysson opened McKinsey's April piece on AI in software development with a scene on the third floor of a bank in London. Three engineers arrive at 8 a.m. Nearly a hundred AI agents have just finished their night shift — refining a cross-border payment system, testing failure paths, shipping updates. The daily ritual is no longer a two-week sprint review. It's a morning walk-through of the work the agents did while the humans were asleep. The headline number: 10× speed at half the cost.

If you've had the Oh-Shit moment, you read that scene and nod. If you haven't, you read it as science fiction. Both reactions miss the point. The interesting thing in that article isn't the speed. It's the shape of the operating model that produces the speed.

McKinsey frames the progression in four levels. Level 1 is manual. Level 2 is a developer with an inline AI pair programmer — call it 1.2×. Level 3 is a developer describing a feature in plain English and getting back working code, tests, and documentation — call it 2×. Level 4 is a small team supervising a coordinated fleet of agents that delivers entire applications end-to-end — 20×. Most enterprises are somewhere around Level 2, feeling productive, missing the curve.

The productivity step-change

Four levels of developer support. One of them isn't like the others.

1×

1.2×

2×

20×

LEVEL 1

Manual

Developer writes every line alone.

LEVEL 2

Pair programmer

Inline AI completion. Typing faster.

LEVEL 3

Automated steps

Agents generate code, tests, docs per feature.

LEVEL 4

Agent factory

Small team supervises a coordinated fleet.

Most enterprises live at Level 2. The jump to Level 4 isn't incremental — it's a different operating model. That's the point.Source: McKinsey, “The AI Revolution in Software Development,” April 2026.

The honest read on the data: the top quintile of companies that have moved beyond Level 2 are booking 16–30% gains in productivity, time-to-market, and customer experience — and 31–45% gains in software quality. Speed and quality rising together. That's the key signal. In the old model they traded against each other; in this one they reinforce each other, because the gates and the generation are both automated.

The report is also clear about what doesn't move the needle. Giving developers AI tools and walking away is Level 2 theatre. The companies unlocking real value have rearchitected how they build software. They've embedded AI across the lifecycle — ideation, requirements, design, coding, testing, deployment, operations — and changed what their people actually do. Engineers move from typing code to decomposing work, writing specs, defining constraints, and reviewing agent output. Product managers move from task management to setting objectives and making trade-offs. Roles shift up the value stack.

Now the shape. Day shift is for judgment, design, and direction. Humans translate business intent into agent-ready work: refined stories, scoped tasks, acceptance criteria, architectural boundaries, quality bars. Night shift is for execution. A coordinated fleet — coding agents, test agents, QA agents, security agents, performance agents, documentation agents — runs multi-step workflows. An orchestrator manages handoffs: failed tests route back to a fix agent, policy violations halt the workflow, performance regressions invoke the benchmarker. By morning, a queue of ready-for-review pull requests is waiting — each with code, tests, logs, and a plain-language rationale.

This is the line from the article worth tattooing on the team room wall: you can't chat your way to production-grade software. You need to master how to provide good instructions to AI agents. Good AI output comes from good context, not clever wording. Spec-driven development and context engineering are the new core skills. If you've watched a team try to get serious output from an agent by chatting with it, you know exactly how fast that breaks down.

What McKinsey calls the engineering platform, we call the platform. Same idea. Same mandate. Isolated, reproducible environments. Secrets and identity managed centrally. Everything as code. Observability built in. Policy-enforced gates in the pipeline. Managed services absorbing the ops load so the team isn't babysitting EC2. Without that foundation, an agent factory isn't a factory — it's a workshop with good tools and no floor.

What McKinsey calls knowledge graphs, we call ground truth: an Obsidian vault of architecture decisions, API patterns, database conventions, testing standards, security checklists — all versioned in Git, all referenced by the agents on every build. The skills sitting on top of that vault are where 21south's standards are encoded so code comes out correct by default, not by accident. Every skill is a contract between intent and output. Without the vault, agents drift. Without the skills, every team reinvents the prompt.

The report is also honest about the hard parts. Token consumption can run away fast when agents spawn subagents. FinOps belongs in the stack from day one. Performance expectations need to be reset — the point isn't that teams shrink; it's that the same team ships an order of magnitude more, or the same volume with more polish. And human review stays hard and high-value. The editors-in-chief still catch architectural drift, still tighten guardrails, still decide when to mark more of the codebase as safe-to-automate.

Here's the scenario to sit with. What happens when a 20× lift lands in your business? Customer journeys evolve weekly instead of yearly. Modernisation stops being a capital project and becomes business-as-usual. Operating leverage rises sharply because the marginal cost of change collapses. Companies that figured out agent factories accelerate away from the rest — and because the advantage compounds, the gap widens, not narrows. This is the structural case. Not a productivity hack. A new cost curve.

Our read at Pareto: this is the direction, and the foundations matter more than the hype. Platform first. Ground truth first. Spec discipline first. Human judgment at the edges, AI at scale in the middle, gates in between. We built the platform for this operating model because it was obvious a year ago that this was coming. What's new is that McKinsey now agrees. That's useful — not because it validates the thesis, but because it forces the conversation into boardrooms that weren't having it yet.

Most organisations will respond by running a pilot. That's fine. Just don't mistake the pilot for the transformation. The pilot tells you whether the agents can ship. The transformation is whether the operating model around them can compound. One is a demo. The other is the advantage.

← Back to all articles