
The Craft of AI
Harness engineering when reasoning is exponential
6 principles for building a harness that grows with your models.
By Luke Lin · · 6 min read
Your team went all-in on AI a few months ago, and everyone’s been maxing out their Claude subscriptions since.
Watch two of your sales reps on the same Monday.
The first rep set Claude Cowork up to read his inbox, research the account, and draft a prep doc before each call. This morning it didn’t run, because his laptop was closed. Yesterday it ran, drifted halfway through, skipped the research, and filled the gaps with things that aren’t true. He spends the twenty minutes before his call fixing it.
The second rep’s agent runs on a schedule at 7am no matter where his laptop is. The workflow around it enforces each step, does the research, checks the data, and has learned over weeks which accounts and signals he actually cares about. He reads the brief while sipping coffee instead of troubleshooting, and enters the day fully prepared.
Same model, same task. One rep is editing skill files and remembering to press go; the other has a system that quietly makes him better at his job every day.
This is harness engineering: the scaffolding around a model that turns raw reasoning into work that reliably gets done, meeting people where they do work.
Harnesses aren’t new; we’ve been wrapping scaffolding around language models since the GPT-3 days. But they’ve completely changed in just a few years, especially since the reasoning jumps from Opus 4.6 and forward. Andrej Karpathy has argued for treating scaffolding as disposable, built to be stripped away as models improve.
So how should we approach harness engineering when we build it not for Opus 4.8, but for whatever lands two generations out — call it Fable 6.0 or GPT-7 — when reasoning has 10x’d again?
The trick for harness engineering is building with principles that amplify the human and AI loop while giving you flexibility and accountability as reasoning improves.
The agent harness, a brief history
Let’s start from the top. Agent = model + harness.
The model reasons; the harness is everything around it that decides what it sees, what it can touch, what it remembers between sessions, and how its work gets checked before it ships.
Chiefly, a harness manages all the scaffolding around what your AI model does. Tomasz Tunguz has a great breakdown of a harness’ components, such as context, tools, orchestration, and more.
In the early ChatGPT era the harness was there to hard-code a workflow: LangChain wired prompt into prompt into prompt with LLMChain, SequentialChain, and AgentExecutor. CrewAI gave each agent a role, a goal, and a backstory and ran the crew on a script.
These were reasonable bets for models that couldn’t be trusted to plan, but also required lots of engineering and wiring just to get an agent to do reasonable work.
Then the models got smarter.
MCP and CLI enabled another way to call tools, structured outputs went native, context windows went to 1M tokens, and the chain-after-chain approach collapsed into a single model call.
LangChain conceded developers had hit a “controllability wall,” decided the right abstraction “was little to no abstraction at all,” and in October 2025 moved its own chain era into a package named langchain-classic. CrewAI bolted on a deterministic Flows layer and pivoted toward governance.
What replaced chains wasn’t a better chain. It was a different category of thing: agent loops, state machines, workflow graphs, durable execution, evaluator loops, and a new term — harness engineering.
What harness engineering looks like today
Now, it’s summer 2026. We got a taste of Fable 5 but most folks are using Opus 4.8 or GPT-5.5. Teams no longer build a harness from scratch.
Anthropic packages the Claude Code loop as the Claude Agent SDK and, since April, runs the whole thing for you as Managed Agents. OpenAI’s Agents SDK enables a model-native harness with sandboxes and memory, and Symphony turns an issue tracker into the control plane for a fleet of Codex agents.
Open-source agents like OpenClaw and Hermes have proliferated, primarily as personal assistants or small-scale business orchestrators.
Just last week, I saw someone demo their OpenClaw harness that scans all their GitHub PRs, open tickets, and summarizes their team’s progress for daily standup. This saves them the painful daily ritual of “is this ticket done or in review? And where exactly is the PR?”
It blew my mind.
And just a few days ago, Databricks open-sourced Omnigent, an early “meta-harness” that sits above Claude Code, Codex, and the agents you write yourself and treats each as interchangeable.
Imagine something like Google Docs, a collaborative multi-player environment, where you can bring whatever model you want, and coordinate and share your agent work across the team.
This is the future of agentic work.
What does a 10x smarter model look like?
If we extrapolate the curve we’re already on, we see a few things with the next generations of foundation models, when Fable 6 or GPT-7 come out.
- Persistent long-horizon memory. Effectively unlimited context through a combination of massive context windows, retrieval, memory compression, and learned state management. Models can reason across years of company history, code changes, decisions, and conversations.
- Stronger world models and causal reasoning. Models get better at identifying implicit constraints, second-order effects, organizational dependencies, and hidden assumptions. Less pattern matching, more coherent internal modeling of systems.
- Long-duration autonomous execution. Models can maintain objectives across multi-hour or multi-day tasks, dynamically replan, recover from failures, and manage large execution graphs without losing intent.
- Hierarchical planning and decomposition. Models are more capable at breaking large goals into sub-goals, assigning work to specialized agents, tracking dependencies, and synthesizing results back into a coherent outcome.
- Native multimodal reasoning. Text, code, documents, spreadsheets, diagrams, screenshots, meetings, audio, and video all become first-class reasoning inputs rather than separate model capabilities.
How to harness when reasoning 10x’s
If that is the model of tomorrow, does our harness of today become obsolete, along with all our work and efforts to set up that platform?
Not necessarily.
If we build with core principles that anticipate the future, we can deliver what we need today and tomorrow:

- Context and knowledge as the backbone. The harness connects and amplifies your expertise and your company’s knowledge: docs, decisions, code history, eval traces, and more. A smarter model still wakes up knowing nothing about your business. It just mines a good context store faster, which makes that store more valuable, not less.
- Human judgment, captured. A harness shouldn’t only do the work. It should catch the taste, the approvals, and the corrections your people supply and feed them back into the system. As Satya Nadella argues, “the real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound.”
- Flexible orchestration that evolves. Keep determinism where the business demands it — the checkpoints, retries, approvals, and SLAs. But design your harness in a way where those checkpoints can change over time. The agent topologies you have today will be obsolete in 6 months, so flexibility is key.
- Continuous evaluation and verification. As agents get more autonomous, you need continuous measurement: task success, regressions, golden sets, cost, and hallucination checks. Have a different model family grade the work, because models flatter their own output.
- Governance and security are 1st-class citizens. A more capable agent taking on work autonomously makes identity, audit, permissions, and policy non-negotiable. As the work gets more important, so does the audit trail. That is exactly what Omnigent’s policy layer reaches for, and it is what lets you give an agent more rope without losing the ability to watch and stop it.
- Surface integrations — meet your users where they’re at. The harness that wins isn’t backend plumbing. It lives where the work happens: Slack, Linear, GitHub, email, the CRM, the IDE. It meets people where they already are, the way a brief that lands in Slack at 7am beats a tool someone has to remember to open.
A harness worth building
It’s paralyzing to consider how to invest in your harness and agent platform right now. Do you try building something that might get obsolete in 6 months? Or do you risk not reaping the gains from a harness that amplifies your team’s abilities?
The answer is to do both: build a harness for today that grows with the models of tomorrow. Understand and anticipate how reasoning jumps will change the ways agents and context get managed, then build your harness with the flexibility that accounts for this change while instrumenting the governance necessary for agents to take on more work.
Harness engineering isn’t going anywhere; it’s still early. Build the harness that amplifies the loop between your people and the model, meet your team where they already work, and design today for the model that lands next year.
That is how you build something that gets more valuable every time the model does.
Frequently asked questions
- What is harness engineering?
- Harness engineering is the practice of building the scaffolding around an AI model — the system that decides what the model sees, what tools it can touch, what it remembers between sessions, and how its work gets checked before it ships. The model reasons; the harness turns that raw reasoning into work that reliably gets done and meets people where they already work.
- What is the difference between an AI model and an agent harness?
- An agent is a model plus a harness. The model does the reasoning. The harness is everything around it: context and knowledge, tools, orchestration, memory, evaluation, governance, and the surface integrations that put the agent where work happens. A capable model with no harness still wakes up knowing nothing about your business and has no reliable way to act on it.
- What are the principles of harness engineering?
- Six principles keep a harness valuable as models improve: (1) context and knowledge as the backbone, (2) human judgment captured and fed back into the system, (3) flexible orchestration that can evolve, (4) continuous evaluation and verification, (5) governance and security as first-class citizens, and (6) surface integrations that meet users where they work — Slack, Linear, GitHub, email, the CRM, and the IDE.
- How do you build an agent harness that won't become obsolete as models improve?
- Build with principles that anticipate stronger reasoning instead of hard-coding today's workflow — Andrej Karpathy argues scaffolding should be treated as disposable. The durable parts are a rich context and knowledge store, captured human judgment, flexible orchestration you can rewire, continuous evaluation, first-class governance, and surface integrations. Those compound in value as the model gets smarter, while rigid prompt-into-prompt chains do not.
- Is harness engineering still worth investing in if models keep getting smarter?
- Yes. A smarter model mines a good context store faster, which makes that store more valuable, not less. The answer to the build-now-or-wait dilemma is to do both: build a harness for today that grows with the models of tomorrow, with the flexibility to absorb reasoning jumps and the governance to let agents safely take on more work.
