Embedding Costs: text-embedding-3-small vs Cohere vs Voyage
Embedding pricing differs from chat pricing. Why text-embedding-3-small at $0.02/1M dominates for most RAG pipelines unless you need multilingual nuance.
Detailed Explanation
Embeddings Are a Different Pricing World
Chat models charge per input + output token. Embedding models only charge for input — there is no completion. The rates are an order of magnitude lower:
| Model | $/1M tokens | Dimensions |
|---|---|---|
| OpenAI text-embedding-3-small | $0.020 | 1536 |
| OpenAI text-embedding-3-large | $0.130 | 3072 |
| Cohere embed-v3 multilingual | $0.100 | 1024 |
| Voyage voyage-3 | $0.060 | 1024 |
| Cohere embed-v3 light | $0.020 | 384 |
| Gemini text-embedding-004 | Free (limited) | 768 |
For a knowledge base of 10,000,000 tokens (~7.5M words, the size of a medium engineering wiki), the one-time embedding cost on text-embedding-3-small is $0.20. Re-embedding once per quarter is essentially free.
When to spend more
- Multilingual retrieval — Cohere multilingual and Voyage outperform OpenAI when the corpus or query mixes languages.
- Code search — Voyage's code variant ranks better than text-embedding-3-small for semantic code search.
- Latency-bound retrieval — text-embedding-3-small at 1536 dims is twice as fast at vector-search time as 3-large at 3072 dims, and the relevance gap on most benchmarks is small.
Storage considerations
Embedding cost is not just the API call — every dimension stored at fp32 is 4 bytes. 10M tokens at 1.5 tokens/chunk = 6.7M chunks at 1536 dims = **40 GB** in fp32. Use fp16 or int8 quantization in your vector DB to cut that by 2-4x. Pinecone, Qdrant, and Chroma all support quantization natively.
Recompute frequency
Re-embed the entire corpus only when the embedding model itself changes (rare) or when you migrate vector DBs. Incremental updates for new documents are billed at the same per-token rate.
Use Case
Use when designing the embedding step of a RAG pipeline, when comparing vector-database providers, or when defending the choice of OpenAI over a more expensive multilingual provider.
Try It — Prompt Token Cost Calculator
Related Topics
RAG Pipeline Cost: Embedding + Retrieval + Generation
Workload patterns
Long-Context Costs: What 128K Tokens Actually Cost Per Call
Caching & long context
Fine-Tuning Cost: Training, Hosting, and Per-Token Inference
Operational
Monthly Budget Estimation: Build a 30-Day Forecast in 5 Minutes
Operational
Cost Optimization Strategies: 10 Techniques to Cut Your LLM Bill
Operational