AI API Integration: OpenAI, Anthropic, and Local Models
Jump to section
Three ecosystems, one architecture
In production you will work with at least two providers. OpenAI has the broadest ecosystem, Anthropic delivers the best performance on complex tasks, and local models (Ollama, vLLM) give you control over data and latency. The key is abstraction — your code should not be tied to a single provider.
The basic pattern is simple: define an interface for LLM calls, implement it for each provider, and switch via configuration. In TypeScript that means a shared ChatMessage type and a complete() function that returns AsyncIterable<string> for streaming.
// Provider-agnostic interface
interface LLMProvider {
complete(messages: ChatMessage[], opts: CompletionOpts): AsyncIterable<string>;
countTokens(text: string): Promise<number>;
}
// OpenAI implementation
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
});
for await (const chunk of stream) {
yield chunk.choices[0]?.delta?.content ?? '';
}Streaming: why and how
Without streaming the user waits 5-30 seconds for a response. With streaming they see the first token in 200-500 ms. For UX this is a massive difference. But streaming adds complexity — you need to handle partial JSON parsing, backpressure, and reconnection.
The Anthropic SDK uses Server-Sent Events (SSE) natively. For the client you expose your own SSE endpoint or use ReadableStream in a Next.js Route Handler. Important: never send the raw provider stream directly to the client — always transform into your own format so you can switch providers without changing the frontend.
// Anthropic streaming with proper error handling
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const stream = client.messages.stream({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
messages,
});
stream.on('text', (text) => {
// Transform to your own event format
res.write(`data: ${JSON.stringify({ type: 'delta', text })}\n\n`);
});
stream.on('error', (err) => {
res.write(`data: ${JSON.stringify({ type: 'error', code: err.status })}\n\n`);
});Error handling and retry strategies
Production API calls fail. Rate limits (429), server errors (500, 503), timeouts, network errors. Without retry logic your system crashes on the first outage. With naive retry (immediate re-attempt) you DDoS the provider and get a longer ban.
Exponential backoff with jitter is the standard: first retry at 1s, second at 2s, third at 4s — plus random jitter of 0-500ms so requests from different clients do not collide. For 429 respect the Retry-After header. For 500/503 retry. For 400 (bad request) do not retry — it is a bug in your code.
async function withRetry<T>(
fn: () => Promise<T>,
maxRetries = 3
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err: any) {
if (attempt === maxRetries) throw err;
if (err.status === 400 || err.status === 401) throw err;
const base = Math.pow(2, attempt) * 1000;
const jitter = Math.random() * 500;
await new Promise(r => setTimeout(r, base + jitter));
}
}
throw new Error('Unreachable');
}Local models: when and how
Local models (via Ollama, vLLM, or TGI) make sense in three scenarios: data must not leave your infrastructure, you need predictable latency without rate limits, or you run high-volume inference where API costs exceed GPU costs. For a prototype Ollama with an OpenAI-compatible endpoint is enough — your existing code works without changes.
// Ollama with OpenAI-compatible endpoint
const localLLM = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama', // required but unused
});
const response = await localLLM.chat.completions.create({
model: 'llama3.1:8b',
messages,
stream: true,
});Structured outputs and function calling
In production you need structured data from the model — JSON, not free text. OpenAI offers Structured Outputs with JSON Schema validation. Anthropic has tool_use where you define an input schema. Both work, but implementations differ. Your wrapper should hide this difference behind a unified interface.
Critical rule: never parse raw model output directly. Always validate through Zod or a similar library. The model can return syntactically valid JSON that violates your business rules — missing fields, wrong types, values out of range. Validation at the system boundary is cheaper than debugging in production.
Never tie application code to a specific provider. Abstraction through interfaces will save you weeks of work when you need to switch models — and that moment will come sooner than you expect.
Start with one provider and an abstraction layer. Add the second when you actually need it. Over-engineering at the start hurts more than refactoring later.
Build & Deploy AI Apps
Never send the raw provider stream directly to the client. Always transform into your own event format — you'll be able to switch providers without changing the frontend.
Implement the LLMProvider interface for OpenAI and Anthropic. Write a complete() function that accepts a message array and returns AsyncIterable<string>. Add retry logic with exponential backoff. Test switching providers via an environment variable without changing caller code.
Hint
Use a factory pattern: createProvider(name: string) returns the correct implementation. For tests mock a provider that returns predefined responses.
Implement a simple AI-powered function with an API: 1) Pick a provider (OpenAI, Anthropic, or open-source), 2) Set up the API key, 3) Write a function that takes text and returns a summary (max 3 sentences), 4) Add error handling, 5) Measure latency and cost for 100 requests. The whole thing should take max 1 hour.
Hint
Document your process and results — they'll serve as reference for similar future tasks.
Implement an AI function that extracts structured data from text. 1) Define a Zod schema for the output (e.g., ExtractedContact with fields name, email, phone, company). 2) Write a prompt instructing the model to return JSON matching the schema. 3) Validate the model response through Zod. 4) Handle the case where the model returns invalid JSON — retry with a clarified prompt. Test on 10 different inputs.
Hint
Zod schema acts as a contract between your code and the model. Define it first and derive TypeScript types from it — not the other way around.
- Abstract LLM calls through an interface — provider lock-in is a real risk
- Streaming improves UX by an order of magnitude but requires your own event format
- Exponential backoff with jitter is mandatory for production retry logic
- Always validate structured outputs through a schema — the model is not a database