freellmpool › providers › Cerebras
Cerebras runs inference on its wafer-scale engine, giving some of the highest
tokens-per-second available — and unusually for a free tier, it pairs that speed with one of the largest
free daily request allowances (on the order of ~14,400 requests/day; verify current numbers). That
combination makes Cerebras an excellent primary free provider. Get a key free at
cloud.cerebras.ai and call the OpenAI-compatible endpoint at
https://api.cerebras.ai/v1. Use freellmpool
to fall back to other tiers when you do exhaust it.
Pick Cerebras when you want both throughput and volume: batch jobs, evaluations, agent loops, and any workload that makes thousands of calls a day. Its catalog is smaller than NVIDIA's, so it's about doing a lot with a few strong models rather than browsing many. If you need a specific niche model, pair Cerebras with a broad provider.
gpt-oss-120b — strong open reasoning model, served fast.zai-glm-4.7 — capable general model (GLM family).curl https://api.cerebras.ai/v1/chat/completions \
-H "Authorization: Bearer $CEREBRAS_API_KEY" -H "Content-Type: application/json" \
-d '{"model":"gpt-oss-120b","messages":[{"role":"user","content":"Hi"}]}'
max_tokens so hidden reasoning doesn't eat
the whole budget and return empty content.Because of its capacity, Cerebras is a good default primary in a pool, with Groq, Gemini and NVIDIA as
backups. freellmpool can prefer Cerebras and fail over only when needed, and its benchmark
command times each provider so routing picks the fastest:
pip install freellmpool
export CEREBRAS_API_KEY=... # plus other free keys
freellmpool ask -p cerebras "..." # pin Cerebras as primary
freellmpool benchmark # time each provider you've configured
See also Groq (the other fast tier), best free LLM API gateway, and using multiple free LLM APIs together.
Yes. Get a key at cloud.cerebras.ai and call the OpenAI-compatible endpoint at api.cerebras.ai/v1. The free tier has a comparatively large daily request cap (~14,400/day; verify current limits) plus per-minute limits.
Both are very fast and often trade the top spot depending on model and load. The practical answer is to
run freellmpool benchmark and let routing prefer whichever is fastest for you right now.
Its mix of high speed and a large daily request allowance means you can route most traffic to it before needing to fall back to other free tiers.