
Building production LLM applications requires understanding what's actually happening with your AI calls: which prompts are being used, how many tokens each request consumes, what everything costs, and how long requests take. Without this visibility, it's difficult to optimize performance, control costs, or debug issues when users report problems.
LangFuse is an open-source observability platform specifically designed for LLM applications. It captures detailed traces of every AI interaction, including prompts, completions, token usage, costs, and latency. Combined with OpenRouter, which provides a unified API for accessing models from OpenAI, Anthropic, Meta, and other providers, you get both the data to optimize and the flexibility to experiment with different models.
In this tutorial, we'll build and deploy a chat backend with full observability. You'll learn how to instrument LLM calls, track costs across different models, collect user feedback, and use the resulting data to make informed optimization decisions.
LangFuse gives you visibility into your LLM calls:
OpenRouter gives you access to multiple LLM providers through one API:
Together, they let you experiment with different models and track which ones work best for your use case.
We'll build a complete chat application with full LLM observability:
By the end, you'll have complete visibility into your LLM usage and the data to optimize.

encore app create --example=ts/langfuse-openrouter to start with a complete working example. This tutorial walks through building it from scratch to understand each component.First, install Encore if you haven't already:
# macOS
brew install encoredev/tap/encore
# Linux
curl -L https://encore.dev/install.sh | bash
# Windows
iwr https://encore.dev/install.ps1 | iex
Create a new Encore application. This will prompt you to create a free Encore account if you don't have one (required for secret management):
encore app create langfuse-app --example=ts/hello-world
cd langfuse-app
Alternatively, you can self-host LangFuse using Docker.
OpenRouter gives you access to Claude 4.5 Opus, GPT-5, Gemini 3 Pro, Llama 3.3, and dozens of other models through a single API.
Install LangFuse and OpenAI SDK (OpenRouter is OpenAI-compatible):
npm install langfuse openai
Every Encore service starts with a service definition file (encore.service.ts). Services let you divide your application into logical components. At deploy time, you can decide whether to colocate them in a single process or deploy them as separate microservices, without changing a single line of code:
// ai/encore.service.ts
import { Service } from "encore.dev/service";
export default new Service("ai");
Store your API keys securely using Encore's built-in secrets management:
// ai/clients.ts
import { Langfuse } from "langfuse";
import OpenAI from "openai";
import { secret } from "encore.dev/config";
const langfuseSecretKey = secret("LangfuseSecretKey");
const langfusePublicKey = secret("LangfusePublicKey");
const openrouterKey = secret("OpenRouterKey");
export const langfuse = new Langfuse({
secretKey: langfuseSecretKey(),
publicKey: langfusePublicKey(),
baseUrl: "https://cloud.langfuse.com",
});
export const openrouter = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: openrouterKey(),
});
Set your API keys for local development:
# LangFuse keys
encore secret set --dev LangfuseSecretKey
encore secret set --dev LangfusePublicKey
# OpenRouter key
encore secret set --dev OpenRouterKey
# For production
encore secret set --prod LangfuseSecretKey
encore secret set --prod LangfusePublicKey
encore secret set --prod OpenRouterKey
To store chat history and track conversations, create a PostgreSQL database. With Encore, you can create a database by simply defining it in code. The framework automatically provisions the infrastructure locally using Docker.
Create the database instance:
// ai/db.ts
import { SQLDatabase } from "encore.dev/storage/sqldb";
export const db = new SQLDatabase("ai", {
migrations: "./migrations",
});
Create the migration file:
-- ai/migrations/1_create_chat_history.up.sql
CREATE TABLE chat_messages (
id TEXT PRIMARY KEY,
user_id TEXT,
session_id TEXT NOT NULL,
role TEXT NOT NULL,
content TEXT NOT NULL,
model TEXT NOT NULL,
langfuse_trace_id TEXT,
tokens_used INTEGER,
cost_usd DECIMAL(10, 6),
latency_ms INTEGER,
rating INTEGER,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_messages_session ON chat_messages(session_id, created_at);
CREATE INDEX idx_messages_trace ON chat_messages(langfuse_trace_id);
CREATE INDEX idx_messages_user ON chat_messages(user_id, created_at DESC);
The key is instrumenting your LLM calls with LangFuse. Here's a chat endpoint that tracks everything:
// ai/chat.ts
import { api } from "encore.dev/api";
import { langfuse, openrouter } from "./clients";
import { db } from "./db";
import log from "encore.dev/log";
interface ChatRequest {
message: string;
model?: string;
userId?: string;
sessionId?: string;
}
interface ChatResponse {
messageId: string;
response: string;
model: string;
traceId: string;
tokensUsed: number;
latencyMs: number;
costUsd: number;
}
export const chat = api(
{ expose: true, method: "POST", path: "/ai/chat" },
async (req: ChatRequest): Promise<ChatResponse> => {
const startTime = Date.now();
const model = req.model || "anthropic/claude-4.5-opus";
const sessionId = req.sessionId || `session-${Date.now()}`;
const messages = [
{
role: "user" as const,
content: req.message,
},
];
// Create a LangFuse trace for this chat completion
const trace = langfuse.trace({
name: "chat-completion",
userId: req.userId,
sessionId,
input: messages,
metadata: {
model,
},
tags: ["chat", model.split("/")[0]],
});
try {
// Create a generation span within the trace
const generation = trace.generation({
name: "openrouter-completion",
model,
modelParameters: {
max_tokens: 1000,
},
input: messages,
});
log.info("Starting chat completion", { traceId: trace.id, model });
// Call OpenRouter for completion
const completion = await openrouter.chat.completions.create({
model,
messages,
max_tokens: 1000,
});
const response = completion.choices[0]?.message?.content || "";
const tokensUsed = completion.usage?.total_tokens || 0;
const latency = Date.now() - startTime;
// Estimate cost (rates vary by model)
const costPer1kTokens = model.includes("claude-4.5-opus") ? 0.015 : 0.002;
const cost = (tokensUsed / 1000) * costPer1kTokens;
// Update the generation span with output and metadata
generation.end({
output: response,
usage: {
totalTokens: tokensUsed,
input: completion.usage?.prompt_tokens || 0,
output: completion.usage?.completion_tokens || 0,
},
metadata: {
latencyMs: latency,
costUsd: cost,
},
});
// Store user message
const userMessageId = `msg-${Date.now()}-${Math.random().toString(36).slice(2, 11)}`;
await db.exec`
INSERT INTO chat_messages (
id, user_id, session_id, role, content, model,
langfuse_trace_id, tokens_used, cost_usd, latency_ms
)
VALUES (
${userMessageId}, ${req.userId}, ${sessionId}, 'user', ${req.message}, ${model},
${trace.id}, 0, 0, 0
)
`;
// Store assistant response
const assistantMessageId = `msg-${Date.now()}-${Math.random().toString(36).slice(2, 11)}`;
await db.exec`
INSERT INTO chat_messages (
id, user_id, session_id, role, content, model,
langfuse_trace_id, tokens_used, cost_usd, latency_ms
)
VALUES (
${assistantMessageId}, ${req.userId}, ${sessionId}, 'assistant', ${response}, ${model},
${trace.id}, ${tokensUsed}, ${cost}, ${latency}
)
`;
log.info("Chat completion successful", {
messageId: assistantMessageId,
traceId: trace.id,
tokensUsed,
latencyMs: latency,
});
// Update trace with output
trace.update({
output: response,
});
// Finalize the trace
await langfuse.flushAsync();
return {
messageId: assistantMessageId,
response,
model,
traceId: trace.id,
tokensUsed,
latencyMs: latency,
costUsd: cost,
};
} catch (error) {
// Track errors in LangFuse
trace.event({
name: "completion-error",
metadata: {
error: error instanceof Error ? error.message : "Unknown error",
},
});
await langfuse.flushAsync();
throw error;
}
}
);
What's being tracked:
LangFuse now has a complete record of this request. You can see it in the dashboard, filter by user or model, and analyze patterns.
Create an endpoint to retrieve chat history for a session:
// ai/chat.ts (continued)
interface ChatMessage {
id: string;
role: string;
content: string;
model: string;
tokensUsed: number;
costUsd: number;
createdAt: Date;
traceId: string | null;
rating: number | null;
}
interface ChatHistoryRequest {
sessionId: string;
}
interface ChatHistoryResponse {
messages: ChatMessage[];
totalTokens: number;
totalCost: number;
}
export const getChatHistory = api(
{ expose: true, method: "GET", path: "/ai/chat/:sessionId" },
async ({ sessionId }: ChatHistoryRequest): Promise<ChatHistoryResponse> => {
const rows = await db.query<{
id: string;
role: string;
content: string;
model: string;
tokens_used: number;
cost_usd: number;
created_at: Date;
langfuse_trace_id: string | null;
rating: number | null;
}>`
SELECT id, role, content, model, tokens_used, cost_usd, created_at, langfuse_trace_id, rating
FROM chat_messages
WHERE session_id = ${sessionId}
ORDER BY created_at ASC
`;
const messages: ChatMessage[] = [];
let totalTokens = 0;
let totalCost = 0;
for await (const row of rows) {
messages.push({
id: row.id,
role: row.role,
content: row.content,
model: row.model,
tokensUsed: row.tokens_used,
costUsd: row.cost_usd,
createdAt: row.created_at,
traceId: row.langfuse_trace_id,
rating: row.rating,
});
totalTokens += row.tokens_used;
totalCost += Number(row.cost_usd);
}
return { messages, totalTokens, totalCost };
}
);
LangFuse supports user feedback scores. Create an endpoint to rate chat responses:
// ai/feedback.ts
import { api } from "encore.dev/api";
import { langfuse } from "./clients";
import { db } from "./db";
interface AddFeedbackRequest {
traceId: string;
score: number; // 1-5
comment?: string;
}
interface AddFeedbackResponse {
success: boolean;
}
export const addFeedback = api(
{ expose: true, method: "POST", path: "/ai/feedback" },
async (req: AddFeedbackRequest): Promise<AddFeedbackResponse> => {
// Send to LangFuse
await langfuse.score({
traceId: req.traceId,
name: "user-rating",
value: req.score,
comment: req.comment,
});
await langfuse.flushAsync();
// Also store in database
await db.exec`
UPDATE chat_messages
SET rating = ${req.score}
WHERE langfuse_trace_id = ${req.traceId}
AND role = 'assistant'
`;
return { success: true };
}
);
Add cost tracking across different models:
// ai/costs.ts
import { api } from "encore.dev/api";
import { db } from "./db";
interface CostSummaryResponse {
totalCost: number;
totalTokens: number;
costsByModel: Record<string, { cost: number; tokens: number; messages: number }>;
}
export const getCostSummary = api(
{ expose: true, method: "GET", path: "/ai/costs" },
async (): Promise<CostSummaryResponse> => {
const rows = await db.query<{
model: string;
total_cost: number;
total_tokens: number;
message_count: number;
}>`
SELECT
model,
SUM(cost_usd) as total_cost,
SUM(tokens_used) as total_tokens,
COUNT(*) as message_count
FROM chat_messages
WHERE role = 'assistant'
GROUP BY model
`;
const costsByModel: Record<string, { cost: number; tokens: number; messages: number }> = {};
let totalCost = 0;
let totalTokens = 0;
for await (const row of rows) {
costsByModel[row.model] = {
cost: Number(row.total_cost),
tokens: row.total_tokens,
messages: row.message_count,
};
totalCost += Number(row.total_cost);
totalTokens += row.total_tokens;
}
return {
totalCost,
totalTokens,
costsByModel,
};
}
);
Start your backend (make sure Docker is running first):
encore run
Your API is now running locally. Open the local development dashboard at http://localhost:9400 to explore your API.

curl -X POST http://localhost:4000/ai/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Explain quantum computing in simple terms",
"model": "anthropic/claude-4.5-opus",
"userId": "user123",
"sessionId": "session456"
}'
Response:
{
"messageId": "msg-1234567890-abc",
"response": "Quantum computing is a revolutionary approach to processing information...",
"model": "anthropic/claude-4.5-opus",
"traceId": "trace_xyz123",
"tokensUsed": 245,
"latencyMs": 1850,
"costUsd": 0.003675
}
Open cloud.langfuse.com and navigate to Traces. Find your trace and you'll see:
This is the power of observability. Instead of guessing, you know exactly what happened.

Now let's see how different models perform for the same task:
# Claude 4.5 Opus (highest quality, premium cost)
curl -X POST http://localhost:4000/ai/chat \
-H "Content-Type: application/json" \
-d '{"message":"What is TypeScript?","model":"anthropic/claude-4.5-opus"}'
# GPT-5 (high quality, high cost)
curl -X POST http://localhost:4000/ai/chat \
-H "Content-Type: application/json" \
-d '{"message":"What is TypeScript?","model":"openai/gpt-5"}'
# Llama 3.3 70B (good quality, low cost)
curl -X POST http://localhost:4000/ai/chat \
-H "Content-Type: application/json" \
-d '{"message":"What is TypeScript?","model":"meta-llama/llama-3.3-70b-instruct"}'
In LangFuse, you can now compare these side-by-side:
For simple questions, Llama might be perfect. For complex reasoning, Claude 4.5 Opus or GPT-5. You have the data to decide.

curl -X POST http://localhost:4000/ai/feedback \
-H "Content-Type: application/json" \
-d '{
"traceId": "trace_xyz123",
"score": 5,
"comment": "Great explanation!"
}'
curl http://localhost:4000/ai/chat/session456
curl http://localhost:4000/ai/costs
The local development dashboard at http://localhost:9400 provides:
Combined with LangFuse's dashboard, you get complete visibility into both your application flow and your LLM calls.

Track different versions of your prompts to compare effectiveness:
const trace = langfuse.trace({
name: "chat-completion",
metadata: {
promptVersion: "v2",
systemPrompt: "You are a helpful assistant specialized in {topic}",
},
});
Compare different models for the same task:
export const compareModels = api(
{ expose: true, method: "POST", path: "/ai/compare" },
async (req: { message: string; models: string[] }) => {
const trace = langfuse.trace({
name: "model-comparison",
metadata: { models: req.models },
});
const results = [];
for (const model of req.models) {
const messages = [{ role: "user" as const, content: req.message }];
const generation = trace.generation({
name: `openrouter-${model}`,
model,
input: messages,
});
const completion = await openrouter.chat.completions.create({
model,
messages,
});
const response = completion.choices[0]?.message?.content || "";
const tokensUsed = completion.usage?.total_tokens || 0;
generation.end({
output: response,
usage: { totalTokens: tokensUsed },
});
results.push({
model,
response,
tokensUsed,
});
}
await langfuse.flushAsync();
return { results, traceId: trace.id };
}
);
Add custom metrics to track specific aspects of your LLM responses:
trace.event({
name: "response-analysis",
metadata: {
responseLength: response.length,
containsCode: response.includes("```"),
sentiment: "positive",
language: "en",
},
});
Compare different models or system prompts:
const variant = Math.random() < 0.5 ? "A" : "B";
const model = variant === "A"
? "anthropic/claude-4.5-opus"
: "openai/gpt-5";
const trace = langfuse.trace({
name: "chat-completion",
metadata: {
experiment: "model-comparison-2025",
variant,
model,
},
});
Once you have traces flowing into LangFuse, you can use the dashboard to identify optimization opportunities:
Cost analysis - Group traces by model to see total spend. If GPT-5 and Claude 4.5 Opus have similar user ratings but different costs, switch to the cheaper one. Filter by user to find high-volume users consuming a disproportionate share of your budget.
Latency tracking - Sort traces by duration to find slow queries. Cross-reference with the prompts and models used. The 95th percentile latency matters more than the average - that's what users actually experience during peak times.
Quality metrics - User ratings tell you which responses work. After changing a prompt template, compare before/after ratings to validate the improvement. Sort by lowest ratings to find queries that need work.
Token optimization - Traces show exact token counts per request. Identify queries using excessive tokens and refine the prompts. Small changes can significantly reduce costs at scale.
For quick demos and prototypes, you can serve a static HTML frontend directly from your Encore app using api.static():
// frontend/static.ts
import { api } from "encore.dev/api";
export const assets = api.static(
{ expose: true, path: "/!path", dir: "./" }
);
// frontend/encore.service.ts
import { Service } from "encore.dev/service";
export default new Service("frontend");
The path: "/!path" pattern serves as a fallback route, meaning it will match any path that doesn't match your API endpoints. This works great for single-page applications.
Create a minimal chat interface in frontend/index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chat</title>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<div class="app">
<aside class="sidebar">
<button class="new-chat-btn" onclick="createNewChat()">+ New Chat</button>
<div class="chat-history" id="chat-history"></div>
</aside>
<main class="main">
<header class="chat-header">
<h1>AI Chat</h1>
<div class="stats">
<span><span id="total-cost">$0.00</span></span>
<span><span id="total-tokens">0</span> tokens</span>
</div>
</header>
<div class="messages" id="messages"></div>
<div class="input-container">
<textarea id="message-input" placeholder="Type your message..."></textarea>
<button id="send-btn" onclick="sendMessage()">Send</button>
</div>
</main>
</div>
<script src="app.js"></script>
</body>
</html>
The JavaScript handles API calls, session management, and chat history:
// frontend/app.js
const API_BASE = 'http://localhost:4000';
let currentSessionId = null;
let sessions = JSON.parse(localStorage.getItem('chat_sessions') || '[]');
async function sendMessage() {
const input = document.getElementById('message-input');
const message = input.value.trim();
if (!message) return;
const response = await fetch(`${API_BASE}/ai/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message,
model: 'anthropic/claude-4.5-opus',
userId: 'user123',
sessionId: currentSessionId
})
});
const data = await response.json();
addMessage('assistant', data.response, data.model, {
tokens: data.tokensUsed,
cost: data.costUsd,
traceId: data.traceId
});
}
Static files are served directly from Encore's Rust runtime with zero JavaScript execution, making them extremely fast. When you deploy with git push encore, your frontend deploys alongside your backend, giving you a single URL you can immediately share to demo your prototype.
For production applications with more complex frontend needs (React, Next.js, build pipelines), we recommend deploying your frontend to Vercel, Netlify, or similar services and using the generated API client to call your Encore backend.
Full example code: The complete chat interface with session management, chat history sidebar, rating persistence, and minimal Apple-like design is available in the example repository.
See the self-hosting instructions for how to use encore build docker to create a Docker image.
Deploy your application using git push encore:
git add -A .
git commit -m "Add LangFuse observability"
git push encore
Set your production secrets:
encore secret set --prod LangfuseSecretKey
encore secret set --prod LangfusePublicKey
encore secret set --prod OpenRouterKey
Note: Encore Cloud is great for prototyping and development with fair use limits. For production workloads, you can connect your AWS or GCP account and Encore will provision infrastructure directly in your cloud account.
If you found this tutorial helpful, consider starring Encore on GitHub to help others discover it.


