How does an AI coding agent compare on Fastify against Encore when you give it the same realistic backend tasks?
We took Claude Code, pointed it at the same project (an HTTP API with persistence, a pub/sub event, a daily cron, and distributed tracing), and ran it on both frameworks using the same prompts, the same model, the same Postgres setup, and the same VM. This article focuses on the Fastify side of our wider AI-readiness benchmark across five TypeScript frameworks. Fastify was the cleanest non-Encore result: when we wrote the production-readiness rubric directly into the tests in Run 3, the agent on Fastify landed every check, but it spent about twice as many tokens as the agent on Encore to do so ($4.60 per run versus $2.58).
Full repo, prompts, starters, and transcripts at github.com/encoredev/ai-backend-benchmark.
Each framework gets its own VM with the same Postgres setup and the same claude-sonnet-4-6 model running through Claude Code. The agent works through three linked tasks: t1 (HTTP API and persistence), then t2 (extend t1 with pub/sub and cron), then t3 (extend t2 with tracing and production-readiness). The tests are plain black-box HTTP probes run with vitest, and they are the same against every framework. The Fastify starter is whatever the fastify-cli produces; the Encore starter is what encore app create produces, which also includes Encore's CLAUDE.md and MCP server.
Fastify had the best first-try-green ratio of any framework in Run 1, hitting 3 of 3 repeats without a re-run. The diffs underneath that result still showed the same three patterns we saw on every non-Encore framework: a Postgres queue table polled from a Fastify hook on setInterval, an in-process setInterval cron scheduled at startup, and CREATE TABLE IF NOT EXISTS boot-time DDL with no migration history. The Fastify tests passed because the test suite was checking HTTP behavior, not production semantics.
The same agent given the same prompts declared the async work using Encore's primitives. Pub/sub was a Topic with a typed Subscription and deliveryGuarantee: "at-least-once". The cron was a CronJob. Schema migrations went into numbered SQL files. For tracing the agent relied on Encore's runtime to propagate the correlation id automatically across calls and handlers.
In Run 2 we pre-installed pg-boss, drizzle-kit, and pino in the Fastify starter with a README explaining each. Fastify regressed in the same way Express did: the agent wrote pg-boss code that registered a scheduled job without first creating the queue, and the server crashed at boot with Queue daily-aggregation not found / Key (name)=(daily-aggregation) is not present in table "queue". pg-boss v10 requires queues to be created explicitly via boss.createQueue('name'), and the agent did not know.
In Run 3 we wrote the production-readiness rubric directly into the test suite (versioned migrations, multi-instance-safe cron, retry plus DLQ, a failed-message endpoint, and structured logging) and gave the agent a higher per-task turn budget. Fastify came in as the cleanest non-Encore result: the agent pulled pg-boss for pub/sub and cron, drizzle-kit for migrations, and pino for logs, and hit green on every check. Total cost for the run was $4.60 against Encore's $2.58 for the same checks.
Even when Fastify produced the same production semantics as Encore, the agent had to assemble them from three independent libraries with three independent integration points, each costing turns to wire correctly and turns to debug when the integration drifted. Encore's primitives encode the same guarantees at the framework level, so the agent reached for one thing per check (a Topic, a CronJob, a migrations/ directory) and the production-readiness checks landed as small changes against the existing declarations rather than as new integration work.
Fastify is a good pick if you want a high-performance HTTP framework and you are willing to absorb the additional token cost when an agent extends it, or if you already run Fastify in production and your team has tuned a stack of plugins around it that you do not want to rebuild.
If you are starting a new TypeScript backend and AI is writing a meaningful share of the code, Encore's primitives let the agent reach for the right thing on the first pass and let the production-readiness checks land with one-line changes against existing declarations.
Clone the repo, point it at your own framework, or rewrite the rubric to match your own definition of production-ready: github.com/encoredev/ai-backend-benchmark.