This is a single page with all the major LLM prices, cited. All prices are dollars per million tokens.
Pull requests are welcome.
Model | Input | Output |
---|---|---|
gpt-4o | 2.50 | 1.25 |
gpt-4o-mini | 0.15 | 0.60 |
o3-mini | 1.10 | 4.40 |
For the full list of models, see here.
Sample code for an LLM call is here. You can count tokens here.
Batch mode provides 50% discount with a 24-hour turnaround time. Prompt caching provides 50% discount. Caching is available after 1024 tokens with increments of 128 tokens. You automatically opt in without additional costs. The cache has a 5-10 minute lifetime and refreshes upon use.
Model | Input | Output |
---|---|---|
claude-3.5-sonnet | 3.00 | 15.00 |
claude-3.5-haiku | 0.80 | 4.00 |
claude-3-haiku | 0.25 | 1.25 |
For the full list of models, see here.
Sample code for an LLM call is here. Instructions to count tokens are here.
Batch mode provides 50% discount with 24-hour turnaround time. Prompt caching provides 90% discount. Caching is only available after 1024 or 2048 tokens. Caching requires opt-in and costs 25% more for the input tokens. The cache has a 5 minute lifetime and refreshes upon use.
Model | Input | Output |
---|---|---|
gemini-1.5-flash | 0.075 | 0.30 |
gemini-1.5-flash-8b | 0.0375 | 0.15 |
gemini-1.5-pro | 1.25 | 5.00 |
For the full list of models, see here or here.
Sample code for an LLM call is here or here (not recommended). Instructions to count tokens are here.
Batch mode provides 50% discount and requires working on Google Cloud Console. Prompt caching provides more than 90% discount. Caching is only available for prefixes longer than 32k tokens. Caching requires opt-in and charges per lifetime.