What is Distributed Tracing? A Practical Guide

If you've ever spent an afternoon adding console.log statements to figure out why a checkout endpoint went from 200ms to 1.2 seconds, you've done the work that a distributed trace does for you in a single click. The trace shows every operation that happened during the request, how long each one took, and where the time went. In the checkout case, the Stripe API call took 940ms. You'd have found that in seconds instead of hours.

We've previously written about what a trace contains and why manual instrumentation is the wrong approach. This post is the practical companion: concrete debugging scenarios where traces save you time, and how to make them part of your development workflow rather than something you reach for after an incident.

Debugging scenarios

These are situations that come up regularly in backend development. Each one takes minutes or hours to track down from logs and metrics, and seconds from a trace. The interactive component below shows what the trace tree looks like for each scenario.

Why is this request slow?
Checkout endpoint usually completes in 200ms. A user reports it took over a second.
POST /checkout1.2s
auth.verify15ms
db INSERT orders12ms
payments.Charge980ms
POST stripe.com/v1/charges940ms
DiagnosisThe Stripe API call accounts for 940ms of the 1.2s total. The bottleneck is an external dependency, not your code.

Click through the tabs to see how traces surface different types of problems.

Slow requests. The trace tree shows every operation with its timing. When a checkout that usually takes 200ms suddenly takes 1.2 seconds, you open the trace and see that the Stripe API call accounts for 940ms. Without a trace, you'd be profiling your own code looking for the bottleneck when the problem is an external dependency.

Deploy regressions. An endpoint went from 45ms to 890ms after a deploy. Comparing traces before and after reveals that one SELECT became 47 sequential SELECTs, because an ORM change introduced an N+1 query pattern. The code diff doesn't make that obvious. The trace does, because you can see every query the request executed and how long each one took.

Environment differences. A pub/sub subscriber works locally but fails in staging. The trace follows the message from publish through to the subscriber across the async boundary and shows the exact error: relation "email_templates" does not exist. The migration was never run in staging. From logs you'd see a generic error. The trace shows the SQL query that failed and the database it ran against.

User-specific issues. A user reports that checkout is broken, but nobody else is affected. Auth handlers attach user identity to traces, so you can search by user ID and find the failing request. Drilling into the spans shows their cart query returned 0 rows and the handler threw because it doesn't guard against empty carts.

Cache effectiveness. You added caching to reduce database load but latency didn't improve. The trace shows cache operations with hit/miss status, and in this case the hit rate is 12% because the TTL is too short. Most requests still fall through to the database. You wouldn't know this from aggregate latency metrics because the cache overhead per request is small. The trace shows you what's happening on each individual request.

Tracing without instrumentation code

In most backend frameworks, getting the traces described above requires writing instrumentation code: creating spans, setting attributes, propagating context across service boundaries. The OpenTelemetry SDK is the standard way to do this, and it works, but it means maintaining a second description of your application's structure alongside the code itself. We've written about why that's a problem in detail.

Encore takes a different approach. The framework uses typed primitives for infrastructure: api() for endpoints, new SQLDatabase() for databases, new Topic() for pub/sub, new CacheCluster() for caches. A Rust-based static analyzer parses these declarations at compile time and builds a complete graph of your application. The runtime uses that graph to trace every operation automatically.

In TypeScript:

import { api } from "encore.dev/api";
import { SQLDatabase } from "encore.dev/storage/sqldb";
import { Topic } from "encore.dev/pubsub";

const db = new SQLDatabase("orders", { migrations: "./migrations" });
const orderCreated = new Topic<OrderEvent>("order-created", {
  deliveryGuarantee: "at-least-once",
});

// Every operation here is traced automatically:
// the API call, the database query, the topic publish.
export const createOrder = api(
  { expose: true, auth: true, method: "POST", path: "/orders" },
  async (req: CreateOrderRequest): Promise<Order> => {
    const order = await db.queryRow`
      INSERT INTO orders (customer_id, total)
      VALUES (${req.customerId}, ${req.total})
      RETURNING *`;
    await orderCreated.publish({ orderId: order!.id, total: order!.total });
    return order!;
  },
);

The same approach works in Go, where infrastructure primitives like sqldb.NewDatabase and pubsub.NewTopic serve the same role. The compiler builds the same application graph regardless of language, and the runtime traces every operation the same way.

package orders

import (
	"context"
	"encore.dev/storage/sqldb"
	"encore.dev/pubsub"
)

var db = sqldb.NewDatabase("orders", sqldb.DatabaseConfig{
	Migrations: "./migrations",
})

var OrderCreated = pubsub.NewTopic[OrderEvent]("order-created", pubsub.TopicConfig{
	DeliveryGuarantee: pubsub.AtLeastOnce,
})

// Traced automatically: the API call, the database query, the topic publish.
//encore:api public method=POST path=/orders
func CreateOrder(ctx context.Context, req *CreateOrderRequest) (*Order, error) {
	var order Order
	err := db.QueryRow(ctx, `
		INSERT INTO orders (customer_id, total)
		VALUES ($1, $2)
		RETURNING id, customer_id, total`,
		req.CustomerID, req.Total,
	).Scan(&order.ID, &order.CustomerID, &order.Total)
	if err != nil {
		return nil, err
	}
	if _, err := OrderCreated.Publish(ctx, &OrderEvent{OrderID: order.ID, Total: order.Total}); err != nil {
		return nil, err
	}
	return &order, nil
}

The trace captures the API call with full request/response bodies, the database query with actual bound parameter values (not just $1), and the pub/sub publish with the message payload.

The local development loop

When you run encore run, every request is fully traced at localhost:9400 with the same waterfall view, span details, and query capture that you'd see in production. Most local development environments have no tracing at all, so developers debug by reading code and adding print statements.

With traces available locally, you write code, hit the endpoint, and open the trace. If a database query took 140ms because a WHERE clause is missing and the database is scanning the whole table, you see it in the span timing. If auth didn't run because you forgot auth: true on the endpoint, that's visible in the span tree. If a service-to-service call is returning an error, you see both the caller's span and the callee's span with the error details.

Traces in Encore's local development dashboard during `encore run`.

The trace structure, span types, and captured data are the same locally and in production, so the workflow you build during development is the same one you use to debug live traffic.

Please note

Traces + AI agents. Encore's MCP server exposes traces to AI coding tools like Cursor and Claude Code. The agent can pull trace spans, identify bottlenecks, and suggest fixes with full context about what the request actually did. See a deep dive on the MCP server for details.

From local to production

The same traces are available in production through Encore Cloud's Trace Explorer. You can filter by endpoint, status code, latency percentile, or time range, and compare request volumes before and after a deploy to find which service regressed.

Latency distributionurl.shorten
p50
42ms
p75
78ms
p90
145ms
p95
210ms
p99
480ms
0-10ms
10-25ms
25-50ms
50-100ms
100-200ms
200-500ms
500ms-1s
1-2s
2s+
2,847 total requestshover bars for details

Latency distribution for an endpoint. Switch time ranges to compare before and after a deploy.

When your p50 is 42ms and your p99 is 480ms, something is making 1% of requests significantly slower than the rest. With complete traces, you can filter to those slow requests and compare their span trees to the fast ones. The answer is usually a specific database query, a third-party API call, or a cache miss that triggers a slower code path.

Traces generated by the Encore runtime can also be exported to Datadog, Grafana Cloud, Honeycomb, Jaeger, or any OTLP-compatible backend. The runtime handles the instrumentation and the OTel ecosystem handles storage and visualization. See the tracing docs for configuration. Because traces are structured data, they can feed into alerts too. If your p99 latency crosses a threshold, an alert that includes the trace ID gives the on-call engineer the full request breakdown without having to reproduce the issue.

Deploy with Encore

One-click deploy a starter in seconds to see tracing in action.

Deploy

Encore is an open-source backend framework for TypeScript and Go. Infrastructure is declared in application code and provisioned automatically. Tracing is built into the runtime. GitHub.

What is distributed tracing?

An interactive guide to debugging real backend issues