Integrating AI Into Web Apps Without the Hype
Most AI integration tutorials skip the hard parts — latency, cost, fallbacks, and UX. Here's what I've learned shipping AI features in production.
Integrating AI Into Web Apps Without the Hype
Everyone is adding AI to their apps.
Most of them are adding it badly.
Not because the features are wrong — but because the integration is fragile, expensive, or unusable in practice.
Here's what I've learned from building AI-powered features in production.
The Basics Everyone Gets Right
Calling an LLM API is trivially easy:
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt }],
});
This works fine in a tutorial.
Production is different.
The Parts Nobody Talks About
1. Latency Is a UX Problem
LLM responses take 2–10 seconds on average.
If you block your UI waiting for a response, your app feels broken.
Always stream. The Vercel AI SDK makes this straightforward with Next.js:
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";
export async function POST(req: Request) {
const { prompt } = await req.json();
const result = streamText({
model: openai("gpt-4o-mini"),
prompt,
});
return result.toDataStreamResponse();
}
On the client, render tokens as they arrive. Users tolerate slow when they see progress.
2. Cost Grows Faster Than You Expect
GPT-4o costs real money at scale.
Rules I follow:
- Use smaller models by default.
gpt-4o-minihandles 80% of tasks at 10× lower cost. - Cache deterministic responses. If the same input produces the same output, cache it.
- Set hard usage limits. Cap monthly spend before you go over budget.
- Count tokens before sending. Reject requests above your threshold early.
import { encode } from "gpt-tokenizer";
const tokens = encode(prompt).length;
if (tokens > 4000) {
return new Response("Prompt too long", { status: 400 });
}
3. LLMs Fail. Plan for It.
LLM APIs return errors, rate limits, and timeouts regularly.
Every AI call needs:
- Retry with exponential backoff
- A fallback path (degraded experience, not a crash)
- Timeout enforcement (don't wait indefinitely)
async function callWithRetry(fn: () => Promise<string>, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await fn();
} catch (err) {
if (i === retries - 1) throw err;
await new Promise((r) => setTimeout(r, 2 ** i * 500));
}
}
}
4. Prompt Engineering Is Engineering
Prompts are code. Treat them as such.
- Store prompts in files, not inline strings
- Version them
- Test them with regression inputs
- Log inputs and outputs in development
A prompt that worked last week can produce garbage after a model update.
5. Structured Output Is Your Friend
Free-form LLM text is hard to use programmatically.
Force structured output whenever possible:
import { generateObject } from "ai";
import { z } from "zod";
const { object } = await generateObject({
model: openai("gpt-4o-mini"),
schema: z.object({
summary: z.string(),
tags: z.array(z.string()),
sentiment: z.enum(["positive", "neutral", "negative"]),
}),
prompt: "Analyze this feedback: " + feedback,
});
Zod schema + generateObject = typed, reliable AI output.
When Not to Use AI
AI adds latency, cost, and unpredictability.
Don't use it when:
- A regex or simple function works
- The output needs to be deterministic
- Response time is critical and streaming isn't an option
- The feature doesn't justify the cost at scale
My Production AI Checklist
- [ ] Streaming enabled for long responses
- [ ] Smaller model as default, larger only when needed
- [ ] Token limit enforced before API call
- [ ] Retry + timeout logic in place
- [ ] Responses cached where input is deterministic
- [ ] Prompts stored and versioned separately
- [ ] Structured output with Zod schema
- [ ] Usage monitoring and spend alerts configured
Final Thought
AI features can be genuinely useful.
But they require the same engineering discipline as any other production system — maybe more, because the failure modes are less predictable.
Build carefully. Ship confidently.