Embedding Costs: text-embedding-3-small vs Cohere vs Voyage

Q: Embedding Costs: text-embedding-3-small vs Cohere vs Voyage

## Embeddings Are a Different Pricing World Chat models charge per input + output token. Embedding models only charge for input — there is no completion. The rates are an order of magnitude lower: | Model | $/1M tokens | Dimensions | | --------------------------------- | ----------- | ---------- | | OpenAI text-embedding-3-small | $0.020 | 1536 | | OpenAI text-embedding-3-large | $0.130 | 3072 | | Cohere embed-v3 multilingual | $0.

Embedding pricing differs from chat pricing. Why text-embedding-3-small at $0.02/1M dominates for most RAG pipelines unless you need multilingual nuance.

Model comparison

Detailed Explanation

Embeddings Are a Different Pricing World

Chat models charge per input + output token. Embedding models only charge for input — there is no completion. The rates are an order of magnitude lower:

Model	$/1M tokens	Dimensions
OpenAI text-embedding-3-small	$0.020	1536
OpenAI text-embedding-3-large	$0.130	3072
Cohere embed-v3 multilingual	$0.100	1024
Voyage voyage-3	$0.060	1024
Cohere embed-v3 light	$0.020	384
Gemini text-embedding-004	Free (limited)	768

For a knowledge base of 10,000,000 tokens (~7.5M words, the size of a medium engineering wiki), the one-time embedding cost on text-embedding-3-small is $0.20. Re-embedding once per quarter is essentially free.

When to spend more

Multilingual retrieval — Cohere multilingual and Voyage outperform OpenAI when the corpus or query mixes languages.
Code search — Voyage's code variant ranks better than text-embedding-3-small for semantic code search.
Latency-bound retrieval — text-embedding-3-small at 1536 dims is twice as fast at vector-search time as 3-large at 3072 dims, and the relevance gap on most benchmarks is small.

Storage considerations

Embedding cost is not just the API call — every dimension stored at fp32 is 4 bytes. 10M tokens at 1.5 tokens/chunk = 6.7M chunks at 1536 dims = **40 GB** in fp32. Use fp16 or int8 quantization in your vector DB to cut that by 2-4x. Pinecone, Qdrant, and Chroma all support quantization natively.

Recompute frequency

Re-embed the entire corpus only when the embedding model itself changes (rare) or when you migrate vector DBs. Incremental updates for new documents are billed at the same per-token rate.

Use Case

Use when designing the embedding step of a RAG pipeline, when comparing vector-database providers, or when defending the choice of OpenAI over a more expensive multilingual provider.

Try It — Prompt Token Cost Calculator

Open full tool →