Live per-token prices for every hosted AI model on OpenRouter, ppq.ai, and kie.ai, side by side. Compare input, output, and blended cost across OpenAI, Anthropic, Google, Mistral, and 50+ other providers, and see which marketplace is cheaper for the same model. Refreshed hourly.
Sortable, searchable price table for every hosted text model on OpenRouter, ppq.ai, and kie.ai. Click a column header to sort.
inclusionAI: Ling-2.6-flash inclusionai/ling-2.6-flash | OpenRouter | inclusionai | 262K | $0.010 | $0.030 | $0.016 |
Ling-2.6-flash inclusionai/ling-2.6-flash | ppq.ai | inclusionai | 262K | $0.011 | $0.032 | $0.017 |
Meta: Llama 3.1 8B Instruct meta-llama/llama-3.1-8b-instruct | OpenRouter | meta-llama | 131K | $0.020 | $0.030 | $0.023 |
Mistral: Mistral Nemo mistralai/mistral-nemo | OpenRouter | mistralai | 131K | $0.020 | $0.030 | $0.023 |
Llama 3.1 8B Instruct meta-llama/llama-3.1-8b-instruct | ppq.ai | meta | 131K | $0.021 | $0.032 | $0.024 |
Mistral Nemo mistralai/mistral-nemo | ppq.ai | mistral | 131K | $0.021 | $0.032 | $0.024 |
Sao10K: Llama 3 8B Lunaris sao10k/l3-lunaris-8b | OpenRouter | sao10k | 8K | $0.040 | $0.050 | $0.043 |
Llama 3 8B Lunaris sao10k/l3-lunaris-8b | ppq.ai | sao10k | 8K | $0.042 | $0.052 | $0.045 |
IBM: Granite 4.0 Micro ibm-granite/granite-4.0-h-micro | OpenRouter | ibm-granite | 131K | $0.017 | $0.112 | $0.046 |
Granite 4.0 Micro ibm-granite/granite-4.0-h-micro | ppq.ai | ibm | 131K | $0.018 | $0.118 | $0.048 |
Google: Gemma 3 4B google/gemma-3-4b-it | OpenRouter | 131K | $0.040 | $0.080 | $0.052 | |
Gemma 3 4B google/gemma-3-4b-it | ppq.ai | 131K | $0.042 | $0.084 | $0.055 | |
LiquidAI: LFM2-24B-A2B liquid/lfm-2-24b-a2b | OpenRouter | liquid | 128K | $0.030 | $0.120 | $0.057 |
Qwen: Qwen2.5 7B Instruct qwen/qwen-2.5-7b-instruct | OpenRouter | qwen | 131K | $0.040 | $0.100 | $0.058 |
Mistral: Mistral Small 3 mistralai/mistral-small-24b-instruct-2501 | OpenRouter | mistralai | 33K | $0.050 | $0.080 | $0.059 |
LFM2-24B-A2B liquid/lfm-2-24b-a2b | ppq.ai | liquidai | 128K | $0.032 | $0.126 | $0.060 |
MythoMax 13B gryphe/mythomax-l2-13b | OpenRouter | gryphe | 4K | $0.060 | $0.060 | $0.060 |
Qwen2.5 7B Instruct qwen/qwen-2.5-7b-instruct | ppq.ai | qwen | 131K | $0.042 | $0.105 | $0.061 |
Mistral Small 3 mistralai/mistral-small-24b-instruct-2501 | ppq.ai | mistral | 33K | $0.052 | $0.084 | $0.062 |
OpenAI: gpt-oss-20b openai/gpt-oss-20b | OpenRouter | openai | 131K | $0.029 | $0.140 | $0.062 |
MythoMax 13B gryphe/mythomax-l2-13b | ppq.ai | gryphe | 4K | $0.063 | $0.063 | $0.063 |
IBM: Granite 4.1 8B ibm-granite/granite-4.1-8b | OpenRouter | ibm-granite | 131K | $0.050 | $0.100 | $0.065 |
gpt-oss-20b openai/gpt-oss-20b | ppq.ai | openai | 131K | $0.030 | $0.147 | $0.065 |
Amazon: Nova Micro 1.0 amazon/nova-micro-v1 | OpenRouter | amazon | 128K | $0.035 | $0.140 | $0.067 |
Google: Gemma 3 12B google/gemma-3-12b-it | OpenRouter | 131K | $0.040 | $0.130 | $0.067 | |
Granite 4.1 8B ibm-granite/granite-4.1-8b | ppq.ai | ibm | 131K | $0.052 | $0.105 | $0.068 |
Nova Micro 1.0 amazon/nova-micro-v1 | ppq.ai | amazon | 128K | $0.037 | $0.147 | $0.070 |
Gemma 3 12B google/gemma-3-12b-it | ppq.ai | 131K | $0.042 | $0.137 | $0.070 | |
Cohere: Command R7B (12-2024) cohere/command-r7b-12-2024 | OpenRouter | cohere | 128K | $0.037 | $0.150 | $0.071 |
Qwen: Qwen3.5-9B qwen/qwen3.5-9b | OpenRouter | qwen | 262K | $0.040 | $0.150 | $0.073 |
Command R7B (12-2024) cohere/command-r7b-12-2024 | ppq.ai | cohere | 128K | $0.039 | $0.158 | $0.075 |
NVIDIA: Nemotron Nano 9B V2 nvidia/nemotron-nano-9b-v2 | OpenRouter | nvidia | 131K | $0.040 | $0.160 | $0.076 |
Arcee AI: Trinity Mini arcee-ai/trinity-mini | OpenRouter | arcee-ai | 131K | $0.045 | $0.150 | $0.077 |
Qwen3.5-9B qwen/qwen3.5-9b | ppq.ai | qwen | 262K | $0.042 | $0.158 | $0.077 |
Google: Gemma 3n 4B google/gemma-3n-e4b-it | OpenRouter | 33K | $0.060 | $0.120 | $0.078 | |
Meta: Llama 3.2 1B Instruct meta-llama/llama-3.2-1b-instruct | OpenRouter | meta-llama | 131K | $0.027 | $0.201 | $0.079 |
Qwen: Qwen3 235B A22B Instruct 2507 qwen/qwen3-235b-a22b-2507 | OpenRouter | qwen | 262K | $0.071 | $0.100 | $0.080 |
Nemotron Nano 9B V2 nvidia/nemotron-nano-9b-v2 | ppq.ai | nvidia | 131K | $0.042 | $0.168 | $0.080 |
Trinity Mini arcee-ai/trinity-mini | ppq.ai | arcee ai | 131K | $0.047 | $0.158 | $0.080 |
OpenAI: gpt-oss-120b openai/gpt-oss-120b | OpenRouter | openai | 131K | $0.039 | $0.180 | $0.081 |
Gemma 3n 4B google/gemma-3n-e4b-it | ppq.ai | 33K | $0.063 | $0.126 | $0.082 | |
Llama 3.2 1B Instruct meta-llama/llama-3.2-1b-instruct | ppq.ai | meta | 131K | $0.028 | $0.211 | $0.083 |
Qwen3 235B A22B Instruct 2507 qwen/qwen3-235b-a22b-2507 | ppq.ai | qwen | 262K | $0.075 | $0.105 | $0.084 |
gpt-oss-120b openai/gpt-oss-120b | ppq.ai | openai | 131K | $0.041 | $0.189 | $0.085 |
Microsoft: Phi 4 microsoft/phi-4 | OpenRouter | microsoft | 16K | $0.065 | $0.140 | $0.088 |
Qwen: Qwen3 30B A3B Instruct 2507 qwen/qwen3-30b-a3b-instruct-2507 | OpenRouter | qwen | 131K | $0.048 | $0.193 | $0.092 |
Phi 4 microsoft/phi-4 | ppq.ai | microsoft | 16K | $0.068 | $0.147 | $0.092 |
NVIDIA: Nemotron 3 Nano 30B A3B nvidia/nemotron-3-nano-30b-a3b | OpenRouter | nvidia | 262K | $0.050 | $0.200 | $0.095 |
Qwen3 30B A3B Instruct 2507 qwen/qwen3-30b-a3b-instruct-2507 | ppq.ai | qwen | 131K | $0.051 | $0.203 | $0.096 |
Nemotron 3 Nano 30B A3B nvidia/nemotron-3-nano-30b-a3b | ppq.ai | nvidia | 262K | $0.052 | $0.210 | $0.100 |
Weighted at 70% input, 30% output tokens. Adjust the mix below.
AI API pricing is the per-token cost that hosted model providers charge for sending text to a model and receiving text back. Input tokens are the prompt; output tokens are the completion. Most labs publish two rates per model, quoted in US dollars per one million tokens, and update them several times a year.
This page reads the public OpenRouter, ppq.ai, and kie.ai model feeds live and shows the current input price, output price, and a blended price for every hosted AI text model the three marketplaces expose. The combined feed covers OpenAI (GPT-5, GPT-5 mini, o1, o3), Anthropic (Claude 4.6 Sonnet, Claude 4.6 Haiku, Claude Opus), Google (Gemini 3.1 Pro, Gemini 3.5 Flash), Meta Llama, Mistral, DeepSeek, Qwen, and roughly 50 other providers.
Prices update hourly. Each row is tagged with the marketplace it came from, so the same model appears once per marketplace and you can see which one charges less. The blended column is a weighted average at a 70 percent input, 30 percent output token mix, which approximates typical chat and coding workloads. Use the blend selector to reweight for input-heavy retrieval pipelines or output-heavy generation tasks.

Run the Numbers
Take any model in the table and compare its API cost against buying a GPU and renting one in the cloud. The decision tool plugs the live API price in for you.

Find the break-even point between local hardware and cloud API spend.

Live cloud GPU rental rates across RunPod and the Vast.ai marketplace.

Every open and closed model we track, ranked by benchmark score and hardware fit.

Spec a full local AI rig matched to your budget, with curated parts and pre-built options.
The live table refreshes hourly. We call the public OpenRouter, ppq.ai, and kie.ai APIs directly and cache the result for one hour so a traffic spike never overwhelms any of the upstreams. All three marketplaces aggregate pricing from each provider, so the rates here track the official OpenAI, Anthropic, Google, Mistral, and DeepSeek prices without scraping.
Each row is tagged with the marketplace it came from. OpenRouter, ppq.ai, and kie.ai all resell the same underlying models, but each one applies its own markup or discount, so the per-token price for GPT-4o, Claude Sonnet, or Gemini can differ between them. kie.ai in particular tends to be the cheapest for the smaller catalog of chat models it carries. Sort the table by Blended price to find the cheaper marketplace for any given model.
Every major AI lab charges separately for tokens you send in (the prompt) and tokens the model writes out (the completion). Output tokens are usually two to five times more expensive because generation is the compute-heavy step. The blended column shows a weighted average at the input/output mix you pick.
A single per-million-token rate that combines input and output cost at a chosen ratio. The default is 70 percent input, 30 percent output, which roughly matches typical chat and coding workloads where prompts are longer than answers. Switch the blend selector to 50/50 or 30/70 to reweight for your traffic.
Free-tier and promotional models on OpenRouter, ppq.ai, or kie.ai return $0 for both input and output. We surface them when the Free filter is on, but the sync job that copies prices into our reference model pages skips them on purpose so a temporary promotion never overwrites a real listed price.
A daily background job pulls the OpenRouter feed and writes the input and output prices onto every reference model in our directory that we have linked to an OpenRouter ID. That means model detail pages and the ROI and decision calculators read the same live numbers without manual edits. The ppq.ai and kie.ai rows in this table are shown for marketplace comparison only and do not write back into the directory.
For internal modelling, yes: the figures are the same per-million-token rates the providers publish. For anything you bill a client on, treat this tracker as a "current as of the last refresh" view and confirm against the provider invoice or pricing page. Prices change without notice on the provider side.