Bestseller #1
Bestseller #3
AI API Expert — Complete Reference
The Intelligence
Layer for Your
Application
A comprehensive expert guide to the world’s most powerful AI APIs — models, pricing, capabilities, and integration patterns from every major provider.
6+
Top Providers
24
Models Covered
∞
Possibilities
1M+
Max Context
01 — MODELS
The Frontier
Models
Models
The leading large language models available via API today, across every major provider.
Anthropic
Claude Sonnet 4
claude-sonnet-4-20250514
Context
200K tokens
Output
64K tokens
Input
$3 / 1M
Output
$15 / 1M
OpenAI
GPT-4o
gpt-4o-2024-11-20
Context
128K tokens
Output
16K tokens
Input
$2.50 / 1M
Output
$10 / 1M
Google DeepMind
Gemini 2.0 Flash
gemini-2.0-flash
Context
1M tokens
Output
8K tokens
Input
$0.075 / 1M
Output
$0.30 / 1M
Mistral AI
Mistral Large 2
mistral-large-2407
Context
128K tokens
Output
8K tokens
Input
$2 / 1M
Output
$6 / 1M
Meta (via Groq)
Llama 3.3 70B
llama-3.3-70b-versatile
Context
128K tokens
Output
32K tokens
Input
$0.59 / 1M
Output
$0.79 / 1M
Anthropic
Claude Haiku 3.5
claude-haiku-4-5-20251001
Context
200K tokens
Output
8K tokens
Input
$0.80 / 1M
Output
$4 / 1M
02 — PRICING
Cost at
Scale
Scale
Per-token pricing and context windows for all major frontier models.
| Model | Input $/1M | Output $/1M | Context Window | Speed |
|---|---|---|---|---|
| Gemini 2.0 Flash | $0.075 | $0.30 | ⚡⚡⚡ | |
| Llama 3.3 70B (Groq) | $0.59 | $0.79 | ⚡⚡⚡ | |
| Claude Haiku 4.5 | $0.80 | $4.00 | ⚡⚡⚡ | |
| GPT-4o mini | $0.15 | $0.60 | ⚡⚡⚡ | |
| Mistral Large 2 | $2.00 | $6.00 | ⚡⚡ | |
| GPT-4o | $2.50 | $10.00 | ⚡⚡ | |
| Claude Sonnet 4 | $3.00 | $15.00 | ⚡⚡ | |
| Claude Opus 4 | $15.00 | $75.00 | ⚡ |
03 — INTEGRATION
Code Examples
Production-ready snippets for every major provider and SDK.
// Anthropic Claude SDK — Node.js import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, }); const message = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, system: 'You are a helpful expert assistant.', messages: [ { role: 'user', content: 'Explain API rate limiting.' } ], }); console.log(message.content[0].text);
// OpenAI SDK — Node.js import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); const completion = await openai.chat.completions.create({ model: 'gpt-4o', max_tokens: 1024, messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Explain API rate limiting.' }, ], }); console.log(completion.choices[0].message.content);
// Google Gemini SDK — Node.js import { GoogleGenerativeAI } from '@google/generative-ai'; const genAI = new GoogleGenerativeAI( process.env.GEMINI_API_KEY ); const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash', }); const result = await model.generateContent( 'Explain API rate limiting.' ); console.log(result.response.text());
// Streaming with Anthropic — token-by-token output const stream = await client.messages.stream({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, messages: [ { role: 'user', content: 'Write a detailed guide.' } ], }); for await (const chunk of stream) { if (chunk.type === 'content_block_delta') { process.stdout.write(chunk.delta.text); } } const final = await stream.finalMessage(); console.log('\nDone. Tokens:', final.usage);
04 — CAPABILITIES
What AI
APIs Can Do
APIs Can Do
Core capabilities across today’s leading large language model APIs.
Text Generation
Produce coherent long-form content, summaries, stories, and structured documents at any length and register.
Claude
GPT-4o
Gemini
Vision & Images
Analyze, describe, extract data from, and reason over images, charts, diagrams, and screenshots.
Claude
GPT-4o
Gemini
Tool Use / Functions
Call external APIs, run code, query databases, and orchestrate multi-step agentic workflows reliably.
Claude
GPT-4o
Mistral
Structured Output
Return guaranteed JSON schemas, typed objects, and validated data structures — zero hallucinated keys.
Claude
GPT-4o
Gemini
Extended Thinking
Deep chain-of-thought reasoning for math, logic puzzles, code review, and research-grade analysis.
Claude
o3 / o4-mini
Batch Processing
Async batch jobs for high-volume, cost-sensitive inference — process thousands of requests at 50% discount.
Claude
GPT-4o
05 — ENDPOINTS
Core API
Endpoints
Endpoints
Essential REST endpoints across the Anthropic API.
POST
/v1/messages
Create a message — the primary completion endpoint
POST
/v1/messages/batches
Submit async batch of up to 10,000 messages
GET
/v1/messages/batches/:id
Poll batch processing status and retrieve results
POST
/v1/complete
Legacy text completion (deprecated, use /messages)
GET
/v1/models
List all available models and their metadata
DEL
/v1/messages/batches/:id/cancel
Cancel a pending batch request before processing
06 — BEST PRACTICES
Expert
Patterns
Patterns
Hard-won wisdom from production AI integrations at scale.
Prompting
System prompt first
Put all persistent context, persona, and rules in the system message. Keep user turns focused on the task.
Few-shot examples
Include 2–5 input/output pairs demonstrating the exact format and tone you need. Examples beat instructions.
XML structuring
Wrap complex input in <document>, <context>, <task> tags. Models parse tagged data more accurately.
Production
Retry with exponential backoff
Handle 429 and 529 errors gracefully. Start at 1s delay, double each attempt, cap at 60s. Log every retry.
Cache deterministic calls
Use prompt caching for identical system prompts. Cache saves up to 90% on repeated long-context calls.
Stream for UX
Always stream to the end user for any response over 200 tokens. Perceived latency drops by 10×, retention climbs.
Bestseller #1
Bestseller #2

