Lesson 1 of 740 min

AI API Integration: OpenAI, Anthropic, and Local Models

Jump to section

Three ecosystems, one architecture

In production you will work with at least two providers. OpenAI has the broadest ecosystem, Anthropic delivers the best performance on complex tasks, and local models (Ollama, vLLM) give you control over data and latency. The key is abstraction — your code should not be tied to a single provider.

The basic pattern is simple: define an interface for LLM calls, implement it for each provider, and switch via configuration. In TypeScript that means a shared ChatMessage type and a complete() function that returns AsyncIterable<string> for streaming.

// Provider-agnostic interface
interface LLMProvider {
  complete(messages: ChatMessage[], opts: CompletionOpts): AsyncIterable<string>;
  countTokens(text: string): Promise<number>;
}

// OpenAI implementation
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
  stream: true,
});
for await (const chunk of stream) {
  yield chunk.choices[0]?.delta?.content ?? '';
}

Streaming: why and how

Without streaming the user waits 5-30 seconds for a response. With streaming they see the first token in 200-500 ms. For UX this is a massive difference. But streaming adds complexity — you need to handle partial JSON parsing, backpressure, and reconnection.

The Anthropic SDK uses Server-Sent Events (SSE) natively. For the client you expose your own SSE endpoint or use ReadableStream in a Next.js Route Handler. Important: never send the raw provider stream directly to the client — always transform into your own format so you can switch providers without changing the frontend.

// Anthropic streaming with proper error handling
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();
const stream = client.messages.stream({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  messages,
});

stream.on('text', (text) => {
  // Transform to your own event format
  res.write(`data: ${JSON.stringify({ type: 'delta', text })}\n\n`);
});
stream.on('error', (err) => {
  res.write(`data: ${JSON.stringify({ type: 'error', code: err.status })}\n\n`);
});

Error handling and retry strategies

Production API calls fail. Rate limits (429), server errors (500, 503), timeouts, network errors. Without retry logic your system crashes on the first outage. With naive retry (immediate re-attempt) you DDoS the provider and get a longer ban.

Exponential backoff with jitter is the standard: first retry at 1s, second at 2s, third at 4s — plus random jitter of 0-500ms so requests from different clients do not collide. For 429 respect the Retry-After header. For 500/503 retry. For 400 (bad request) do not retry — it is a bug in your code.

async function withRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (attempt === maxRetries) throw err;
      if (err.status === 400 || err.status === 401) throw err;
      const base = Math.pow(2, attempt) * 1000;
      const jitter = Math.random() * 500;
      await new Promise(r => setTimeout(r, base + jitter));
    }
  }
  throw new Error('Unreachable');
}

Local models: when and how

Local models (via Ollama, vLLM, or TGI) make sense in three scenarios: data must not leave your infrastructure, you need predictable latency without rate limits, or you run high-volume inference where API costs exceed GPU costs. For a prototype Ollama with an OpenAI-compatible endpoint is enough — your existing code works without changes.

// Ollama with OpenAI-compatible endpoint
const localLLM = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama', // required but unused
});

const response = await localLLM.chat.completions.create({
  model: 'llama3.1:8b',
  messages,
  stream: true,
});

Structured outputs and function calling

In production you need structured data from the model — JSON, not free text. OpenAI offers Structured Outputs with JSON Schema validation. Anthropic has tool_use where you define an input schema. Both work, but implementations differ. Your wrapper should hide this difference behind a unified interface.

Critical rule: never parse raw model output directly. Always validate through Zod or a similar library. The model can return syntactically valid JSON that violates your business rules — missing fields, wrong types, values out of range. Validation at the system boundary is cheaper than debugging in production.

Never tie application code to a specific provider. Abstraction through interfaces will save you weeks of work when you need to switch models — and that moment will come sooner than you expect.

Start with one provider and an abstraction layer. Add the second when you actually need it. Over-engineering at the start hurts more than refactoring later.

Build & Deploy AI Apps

Never send the raw provider stream directly to the client. Always transform into your own event format — you'll be able to switch providers without changing the frontend.

Multi-provider wrapper

Implement the LLMProvider interface for OpenAI and Anthropic. Write a complete() function that accepts a message array and returns AsyncIterable<string>. Add retry logic with exponential backoff. Test switching providers via an environment variable without changing caller code.

Hint

Use a factory pattern: createProvider(name: string) returns the correct implementation. For tests mock a provider that returns predefined responses.

Hello World with LLM API

Implement a simple AI-powered function with an API: 1) Pick a provider (OpenAI, Anthropic, or open-source), 2) Set up the API key, 3) Write a function that takes text and returns a summary (max 3 sentences), 4) Add error handling, 5) Measure latency and cost for 100 requests. The whole thing should take max 1 hour.

Hint

Document your process and results — they'll serve as reference for similar future tasks.

Structured outputs with Zod validation

Implement an AI function that extracts structured data from text. 1) Define a Zod schema for the output (e.g., ExtractedContact with fields name, email, phone, company). 2) Write a prompt instructing the model to return JSON matching the schema. 3) Validate the model response through Zod. 4) Handle the case where the model returns invalid JSON — retry with a clarified prompt. Test on 10 different inputs.

Hint

Zod schema acts as a contract between your code and the model. Define it first and derive TypeScript types from it — not the other way around.

Key Takeaways

Abstract LLM calls through an interface — provider lock-in is a real risk
Streaming improves UX by an order of magnitude but requires your own event format
Exponential backoff with jitter is mandatory for production retry logic
Always validate structured outputs through a schema — the model is not a database

Back to course

LinkedIn X / Twitter

Back to course Next lesson

Lesson 1 of 740 min

AI API Integration: OpenAI, Anthropic, and Local Models

Jump to section

Three ecosystems, one architecture

// Provider-agnostic interface
interface LLMProvider {
  complete(messages: ChatMessage[], opts: CompletionOpts): AsyncIterable<string>;
  countTokens(text: string): Promise<number>;
}

// OpenAI implementation
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
  stream: true,
});
for await (const chunk of stream) {
  yield chunk.choices[0]?.delta?.content ?? '';
}

Streaming: why and how

// Anthropic streaming with proper error handling
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();
const stream = client.messages.stream({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  messages,
});

stream.on('text', (text) => {
  // Transform to your own event format
  res.write(`data: ${JSON.stringify({ type: 'delta', text })}\n\n`);
});
stream.on('error', (err) => {
  res.write(`data: ${JSON.stringify({ type: 'error', code: err.status })}\n\n`);
});

Error handling and retry strategies

async function withRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (attempt === maxRetries) throw err;
      if (err.status === 400 || err.status === 401) throw err;
      const base = Math.pow(2, attempt) * 1000;
      const jitter = Math.random() * 500;
      await new Promise(r => setTimeout(r, base + jitter));
    }
  }
  throw new Error('Unreachable');
}

Local models: when and how

// Ollama with OpenAI-compatible endpoint
const localLLM = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama', // required but unused
});

const response = await localLLM.chat.completions.create({
  model: 'llama3.1:8b',
  messages,
  stream: true,
});

Structured outputs and function calling

Never tie application code to a specific provider. Abstraction through interfaces will save you weeks of work when you need to switch models — and that moment will come sooner than you expect.

Start with one provider and an abstraction layer. Add the second when you actually need it. Over-engineering at the start hurts more than refactoring later.

Build & Deploy AI Apps

Never send the raw provider stream directly to the client. Always transform into your own event format — you'll be able to switch providers without changing the frontend.

Multi-provider wrapper

Hint

Use a factory pattern: createProvider(name: string) returns the correct implementation. For tests mock a provider that returns predefined responses.

Hello World with LLM API

Hint

Document your process and results — they'll serve as reference for similar future tasks.

Structured outputs with Zod validation

Hint

Zod schema acts as a contract between your code and the model. Define it first and derive TypeScript types from it — not the other way around.

Key Takeaways

Abstract LLM calls through an interface — provider lock-in is a real risk
Streaming improves UX by an order of magnitude but requires your own event format
Exponential backoff with jitter is mandatory for production retry logic
Always validate structured outputs through a schema — the model is not a database

Back to course

LinkedIn X / Twitter