Executive Summary
At the AI Engineer Code Summit in New York City, Anthropic shared key insights into the Claude Agents SDK that reshape how effective AI agents are built in practice. By exposing the same agent harness that powers Claude Code, the SDK highlights a shift away from prompt-centric approaches toward more structured, reliable agent architectures.
These learnings reflect a growing challenge many teams are encountering in practice: increasing model capability and code generation speed without losing control, auditability, or reliability. This post distills the core technical takeaways and explains why the infrastructure around the model—the agent harness—is just as critical as the model itself.
The full workshop recording from the summit is available on YouTube.
In this blog post, we dive into our main learnings.
To understand the Claude Agents SDK, we first need to understand the concept of an Agent Harness (also called a "scaffold"). Here, we highly recommend Philipp Schmid's blog post on this topic, which we drew inspiration from.
It helps to draw an equivalent to how a computer works.
In practice, the harness acts as the coordinating layer for perception, memory, and reasoning, enabling orchestrated workflows rather than relying on prompt engineering alone.
If we would apply this to Claude Code, then:
The important insight here is that the harness is just as important as the model itself. On the CORE benchmark (which tests agents' ability to reproduce scientific results), the same Opus 4.5 model scored 78% with Claude Code's harness but only 42% with Smolagents. That's a massive difference in performance! These results highlight how capability evaluations on multi-step tasks are deeply influenced by harness design, not just the underlying model. This is the main reason why people love Claude Code so much, as it gets the most performance out of the model.
The Claude Agents SDK packages up the powerful harness from Claude Code and makes it available for developers to build their own applications on top of it. The SDK is available in both Python and Typescript.
An important point here is that it's not just for coding agents! You can build any type of application on top of it, such as finance agents, a customer service agent, or a data engineer agent.
Anthropic built it because they have a pretty opinionated take on how to build effective AI agents which we'll discuss below, and they were building their own agents on top of the SDK. Hence they want people to benefit from the same harness that powers Claude Code.
Anthropic says "bash is all you need" for agents. It's the thing that makes Claude Code so good. Bash (the terminal app on a Macbook) is oftentimes the most effective tool an agent can use to do any work. Instead of creating separate Search, Lint, and Execute tools, often with long descriptions, Claude can use low-level Unix primitives like grep, tail, and npm run lint.
Bash is composible, as multiple tool calls can be chained together using the pipe operator ('|'). The agent can also store the results of tool calls to a file using >, making them searchable. Moreover, it can use many existing CLIs such as ffmpeg when working with audio or video, or gh when working with Github.
This approach is simple and more powerful than creating dozens of specialized tools, whose descriptions and results consume a lot of tokens of the limited context window. It reduces tool-call overhead while improving tool utilization efficiency across a wide range of data formats.
Take the example of querying an email API. Rather than calling a tool of an email MCP server with limited flexibility, the agent can use Bash to perform SQL queries, search the results, and write them to a file. Note that all those tool calls can be performed with a single line of commands, by chaining them together using the pipe operator.
Of course, Bash should not be used at all times. Tools (via MCP or not) and code generation (writing and storing reusable scripts) can be useful too. Each approach has its pros and cons, which Anthropic summarizes below:
Tools:
Bash:
Code Gen:
They recommend using them for the following use cases:
Anthropic realized (around the same time as Cursor) that loading all tool descriptions of an MCP server in the system prompt is not the best idea, as in that way you already waste a lot of tokens of the context window of the LLM, before the agent starts doing any work. For example, the Github MCP server includes 38 tools which would take up 15k tokens of tool descriptions.
Hence, a better idea could be to store information the agent can use on the filesystem, and only let it retrieve that information when needed based on the incoming request, "just-in-time". This way, the filesystem enables "dynamic context discovery": tokens only get loaded into the context window when needed.
Some examples of this:
Anthropic heavily recommends 3 steps for an effective agent loop:
If the task can be verified, it's a great candidate for building an agent for it.
The community is already building impressive applications. Some examples below:
The Claude Agents SDK represents a significant step forward in making enterprise-grade agent capabilities accessible to developers. By understanding the power of Bash, when to use tools versus bash versus code generation, and leveraging the power of the filesystem for dynamic context discovery, you can build agents that are both powerful and efficient.
The key insight is that the harness - the infrastructure around the model - is just as important as the model itself. With the Claude Agents SDK, you now have access to the same battle-tested infrastructure that powers Claude Code.