LLM Token Pricing, LLM Tokenomics

20/04/2024 09:08:16

LLM Hostings

In this post, we've compiled a summary of both proprietary and open-source models that are provided as a service. However, this list does not encompass the full range of available hosting providers. You'll find numerous GPU and serverless LLM hostings listed in our directory. For more details, please refer to the "LLM Hostings" section.

Proprietary model (LLM) pricing

The most recent LLM token pricing for the proprietary LLMs as of April 20, 2024:

Platform	Model	Input: $/1M Tokens	Output: $/1M Tokens	Avg: $/1M Tokens
OpenAI	GPT 4 Turbo	10.00	30.00	20.00
OpenAI	GPT 3.5 Turbo	0.50	1.50	1.00
Cohere	Command-R	0.50	1.50	1.00
Cohere	Command-Light	0.30	0.60	0.45
Anthropic	Claude 3 Opus	15.00	75.00	45.00
Anthropic	Claude 3 Sonnet	3.00	15.00	9.00
Anthropic	Claude 3 Haiku	0.25	1.25	0.75
Anthropic	Claude Instant	0.80	2.40	1.60

It's somewhat challenging to determine which model is superior from the pricing perspective, as each boasts advantages in different use cases. Therefore, the choice must be made within the context of the specific business case in question. However, the Claude 3 family, including Sonnet and Haiku, are definitely worth using.

Open-source serverless model (LLM) pricing

The most recent LLM token pricing for the open-source LLMs hosted as serverless models as of April 20, 2024:

Model	Platform	Input: $/1M Tokens	Output: $/1M Tokens	Avg: $/1M Tokens
Mistral 7B	DeepInfra	0.10	0.10	0.10
Mistral 7B	Anyscale	0.15	0.15	0.15
Mistral 7B	OctoAI	0.10	0.25	0.18
Mistral 7B	Fireworks.ai	0.20	0.20	0.20
Mistral 7B	Together.ai	0.20	0.20	0.20
Mistral 7B	Mistral	0.25	0.25	0.25
Mixtral 8x7B	DeepInfra	0.27	0.27	0.27
Mixtral 8x7B	OctoAI	0.30	0.50	0.40
Mixtral 8x7B	Anyscale	0.50	0.50	0.50
Mixtral 8x7B	Fireworks.ai	0.50	0.50	0.50
Mixtral 8x7B	Together.ai	0.60	0.60	0.60
Mixtral 8x7B	Mistral	0.70	0.70	0.70
Mixtral 8x22B	DeepInfra	0.65	0.65	0.65
Mixtral 8x22B	Together.ai	1.20	1.20	1.20
Mixtral 8x22B	Mistral	2.00	6.00	4.00

Long-context window embeddings

The most recent embedding pricing for the long-context window models as of April 20, 2024:

Platform	Model	Context Window	$/1M Tokens
Together.ai	m2-bert-80M-32k-retrieval	32768	0.01
Together.ai	m2-bert-80M-8k-retrieval	8192	0.01
Fireworks.ai	nomic-embed-text-v1.5	8192	0.01
OpenAI	text-embedding-3-small	8192	0.02
Jinaai	jina-embeddings-v2-base-en	8192	0.02
Mistral	mistral-medium	8192	0.10
Nomic	nomic-embed-text-v1.5	8192	0.10
Voyage AI	voyage-2	4000	0.10
Voyage AI	voyage-large-2	16000	0.12
OpenAI	text-embedding-3-large	8192	0.13

Short-context window embeddings

Platform	Model	Context Window	$/1M Tokens
Together.ai	google/bert-base-uncased	512	0.01
DeepInfra	baai/bge-large-en-v1.5	512	0.01
DeepInfra	thenlper/gte-large	512	0.01
Together.ai	baai/bge-large-en-v1.5	512	0.02
Fireworks.ai	WhereIsAI/uae-large-v1	512	0.02
Fireworks.ai	thenlper/gte-large	512	0.02
Anyscale	baai/bge-large-en-v1.5	512	0.05
Anyscale	thenlper/gte-large	512	0.05
OctoAI	thenlper-gte-large	512	0.05