05/26/26

Best Backend Framework for Claude Code (2026)

We benchmarked Claude Code on five TypeScript backend frameworks. Only one shipped production-ready code on the first pass, and only one had a CLAUDE.md and MCP server in the box.

6 Min Read

Which TypeScript backend framework is best to use with Claude Code?

To answer that we took Claude Code, pointed it at the same realistic backend project (an HTTP API with persistence, a pub/sub event, a daily cron, and distributed tracing), and ran it on five frameworks (Encore, Express, Fastify, Hono, and NestJS) using the same prompts, the same model, the same Postgres setup, and the same VM. We graded the output against a 36-check production-readiness rubric. The short answer is that Encore is the only framework in the benchmark that both ships materials Claude Code reads in (a CLAUDE.md, an MCP server, llms.txt plus llms-full.txt, and a dedicated AI-integration docs page) and provides framework primitives the agent reaches for by default, and it is the only framework where Claude's first draft was production-ready.

Full benchmark, prompts, starters, and transcripts at github.com/encoredev/ai-backend-benchmark.

What Claude Code asks of a framework

There are three things that decide how well Claude Code does on a given backend framework: the agent-readiness materials the framework ships, the framework primitives the agent reaches for by default, and the cost-per-iteration of getting the work done. Agent-readiness materials (a CLAUDE.md, an MCP server, an llms.txt) are what Claude reads before it writes a line of code, and they decide whether the agent has a calibrated starting point or has to rederive the framework's conventions from source. The framework's primitives decide whether the shortest path to a green test suite goes through new Topic() and new CronJob() or through a hand-rolled Postgres queue polled by setInterval. And the cost-per-iteration decides what a team's monthly bill looks like when this kind of workflow is the default rather than the exception.

Agent-readiness materials, per framework

FrameworkCLAUDE.mdMCP serverllms.txtAI integration docs
Encoreyesyesyes (llms.txt + llms-full.txt)yes
Honononoyesno
Expressnononono
Fastifynononono
NestJSnononono

Of the five frameworks in the benchmark, only Encore ships all four agent-readiness surfaces, only Hono ships any of them at all (an llms.txt), and the other three frameworks ship none.

What Claude shipped on each framework

On the baseline run every framework hit 31 of 31 tests, but the diffs underneath those green test suites diverged sharply.

Encore

On Encore Claude reached for the framework's primitives by default. Pub/sub was a typed Topic with a Subscription:

export const orderCreated = new Topic<OrderCreatedEvent>("order-created", {
  deliveryGuarantee: "at-least-once",
});

new Subscription(orderCreated, "send-notification", {
  handler: async (event) => { /* ... */ },
});

The cron was a CronJob:

const _ = new CronJob("daily-aggregation", {
  every: "24h",
  endpoint: runDailyAggregation,
});

Schema migrations went into numbered SQL files which Encore tracks and applies in order on every deploy. The Run 3 rubric checks landed as small changes against the existing declarations: a retryPolicy: { maxRetries: 3 } on the existing subscription, encore.dev/log for structured logging, and Encore's service migrations for the schema-versioning check.

Express, Fastify, Hono, NestJS

On the four non-Encore frameworks Claude converged on the same three anti-patterns. Pub/sub was a Postgres queue table polled by setInterval:

await pool.query(
  `INSERT INTO event_queue (event_type, payload, status) VALUES ($1, $2, 'pending')`,
  ['order-created', { order_id: id }]
);

setInterval(async () => {
  // SELECT ... FOR UPDATE SKIP LOCKED, then process or bump retry counter
}, 500);

The daily cron was a setTimeout chain scheduled at startup, which fires once per replica. The schema was CREATE TABLE IF NOT EXISTS at boot with no migration history. All of these pass the test suite. None of them is what you want in production.

Numbers per framework

FrameworkRun 1 cost (median)Run 3 costRun 3 rubric (out of 36)Total cost across three runs
Encore$1.96$2.5836/36$6.29
Hono$1.55not reported29/36~$8
Fastifysimilar to Encore$4.6036/36~$10
Expresssimilar to Encorenot reported35/36~$9
NestJS$2.61 (one $4.45 outlier)$5.9530/36$12.69

What Run 2 and Run 3 added

In Run 2 we pre-installed pg-boss, drizzle-kit, and pino into each non-Encore starter with a README explaining what each library was for. Every non-Encore framework regressed. None of them landed a first-try-green run across three repeats. The most common failure was Claude registering a pg-boss scheduled job without first creating the queue (pg-boss v10 requires boss.createQueue('name') before sending or scheduling, and Claude did not know). On NestJS the failure was a module-wiring bug: Claude imported pg-boss into a service but did not register the wrapping provider in the module.

In Run 3 we wrote five production-readiness tests into the suite (multi-instance-safe cron, retry plus DLQ, a failed-message endpoint, versioned migrations, structured logging) and gave Claude a higher per-task turn budget. Encore reached 36 of 36 with the one-line changes above. Fastify was the cleanest non-Encore result, hitting every check at $4.60 by composing pg-boss + drizzle-kit + pino. Express came one test short on the migrations check. Hono finished 29 of 36 with tracing broken on unknown order ids. NestJS finished 30 of 36 at $5.95 after shipping a TypeScript error on its own starter.

Why Encore wins for Claude Code

Three reasons. First, Encore ships materials Claude Code reads in: encore llm-rules init writes a CLAUDE.md calibrated to the framework's conventions, encore mcp start exposes the live app structure (services, endpoints, database schemas) over MCP, and the llms.txt plus the AI-integration docs page give the model the rest of the context it needs. Second, Encore's framework primitives encode the production-readiness guarantees that an agent would otherwise have to assemble from library compositions, so the shortest path to a green test suite is also the shortest path to code that is safe to deploy. Third, the combined effect is the lowest token cost per run of any framework in the benchmark and the only framework that landed all 36 rubric checks on the first pass.

How to set up Claude Code with Encore

encore app create my-app
cd my-app
encore llm-rules init     # writes CLAUDE.md
encore mcp start          # starts the MCP server
claude                    # opens Claude Code in this project

From there Claude reads the framework's conventions out of CLAUDE.md, queries the live app state through MCP, and reaches for the right primitives when extending the codebase.

When to choose a different framework

If you have an existing Fastify, Express, NestJS, or Hono codebase and you are not planning to let Claude Code drive a meaningful share of your work, the additional cost the benchmark measured does not apply to you. If you are letting Claude Code drive but your application is small enough or simple enough that the production-readiness checks the rubric grades are not relevant (a static-ish edge API, a throwaway prototype, a tool that runs on a single replica), the cheaper baseline runs on Hono or Express may be a better trade.

For anything else, Encore is the framework that finishes ahead on every dimension the benchmark measured.

Reproduce the benchmark

Clone the repo, point it at your own framework, or rewrite the rubric to match your own definition of production-ready: github.com/encoredev/ai-backend-benchmark.

Ready to build your next backend?

Encore is the Open Source framework for building robust type-safe distributed systems with declarative infrastructure.