Embedding Costs: text-embedding-3-small vs Cohere vs Voyage

Embedding pricing differs from chat pricing. Why text-embedding-3-small at $0.02/1M dominates for most RAG pipelines unless you need multilingual nuance.

Model comparison

Detailed Explanation

Embeddings Are a Different Pricing World

Chat models charge per input + output token. Embedding models only charge for input — there is no completion. The rates are an order of magnitude lower:

Model $/1M tokens Dimensions
OpenAI text-embedding-3-small $0.020 1536
OpenAI text-embedding-3-large $0.130 3072
Cohere embed-v3 multilingual $0.100 1024
Voyage voyage-3 $0.060 1024
Cohere embed-v3 light $0.020 384
Gemini text-embedding-004 Free (limited) 768

For a knowledge base of 10,000,000 tokens (~7.5M words, the size of a medium engineering wiki), the one-time embedding cost on text-embedding-3-small is $0.20. Re-embedding once per quarter is essentially free.

When to spend more

  • Multilingual retrieval — Cohere multilingual and Voyage outperform OpenAI when the corpus or query mixes languages.
  • Code search — Voyage's code variant ranks better than text-embedding-3-small for semantic code search.
  • Latency-bound retrieval — text-embedding-3-small at 1536 dims is twice as fast at vector-search time as 3-large at 3072 dims, and the relevance gap on most benchmarks is small.

Storage considerations

Embedding cost is not just the API call — every dimension stored at fp32 is 4 bytes. 10M tokens at 1.5 tokens/chunk = 6.7M chunks at 1536 dims = **40 GB** in fp32. Use fp16 or int8 quantization in your vector DB to cut that by 2-4x. Pinecone, Qdrant, and Chroma all support quantization natively.

Recompute frequency

Re-embed the entire corpus only when the embedding model itself changes (rare) or when you migrate vector DBs. Incremental updates for new documents are billed at the same per-token rate.

Use Case

Use when designing the embedding step of a RAG pipeline, when comparing vector-database providers, or when defending the choice of OpenAI over a more expensive multilingual provider.

Try ItPrompt Token Cost Calculator

Open full tool