freellmpool › providers › Cerebras

Free Cerebras API: speed plus a big free cap

Cerebras runs inference on its wafer-scale engine, giving some of the highest tokens-per-second available — and unusually for a free tier, it pairs that speed with one of the largest free daily request allowances (on the order of ~14,400 requests/day; verify current numbers). That combination makes Cerebras an excellent primary free provider. Get a key free at cloud.cerebras.ai and call the OpenAI-compatible endpoint at https://api.cerebras.ai/v1. Use freellmpool to fall back to other tiers when you do exhaust it.

What Cerebras' free tier is good for

Pick Cerebras when you want both throughput and volume: batch jobs, evaluations, agent loops, and any workload that makes thousands of calls a day. Its catalog is smaller than NVIDIA's, so it's about doing a lot with a few strong models rather than browsing many. If you need a specific niche model, pair Cerebras with a broad provider.

Free models

gpt-oss-120b — strong open reasoning model, served fast.
zai-glm-4.7 — capable general model (GLM family).

Get a key and call it

curl https://api.cerebras.ai/v1/chat/completions \
  -H "Authorization: Bearer $CEREBRAS_API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"gpt-oss-120b","messages":[{"role":"user","content":"Hi"}]}'

Limits and gotchas

The large daily cap is the draw, but there are still per-minute request and token limits — sustained bursts can hit the minute cap even when the daily budget is fine.
The model list is intentionally small; don't expect dozens of variants.
gpt-oss is a reasoning model — give it enough max_tokens so hidden reasoning doesn't eat the whole budget and return empty content.

Pool Cerebras with other free tiers

Because of its capacity, Cerebras is a good default primary in a pool, with Groq, Gemini and NVIDIA as backups. freellmpool can prefer Cerebras and fail over only when needed, and its benchmark command times each provider so routing picks the fastest:

pip install freellmpool
export CEREBRAS_API_KEY=...              # plus other free keys
freellmpool ask -p cerebras "..."        # pin Cerebras as primary
freellmpool benchmark                    # time each provider you've configured

See also Groq (the other fast tier), best free LLM API gateway, and using multiple free LLM APIs together.

FAQ

Is the Cerebras API free?

Yes. Get a key at cloud.cerebras.ai and call the OpenAI-compatible endpoint at api.cerebras.ai/v1. The free tier has a comparatively large daily request cap (~14,400/day; verify current limits) plus per-minute limits.

Is Cerebras faster than Groq?

Both are very fast and often trade the top spot depending on model and load. The practical answer is to run freellmpool benchmark and let routing prefer whichever is fastest for you right now.

Why use Cerebras as a primary provider?

Its mix of high speed and a large daily request allowance means you can route most traffic to it before needing to fall back to other free tiers.

Part of freellmpool (MIT, open source). Limits change — check Cerebras' docs. Updated 2026-06-03.