freellmpool › guide
To use several free LLM tiers at once — Groq, Cerebras, Gemini, NVIDIA, OpenRouter, Cloudflare and more — pool them behind one endpoint so each request goes to a provider you have capacity on and fails over when one is rate-limited. The open-source freellmpool does exactly this: one OpenAI-compatible interface over 16 providers, with per-day quota tracking so you drain each tier evenly. Two providers are keyless, so you can start with no signup.
pip install freellmpool
freellmpool ask "Explain the CAP theorem in one sentence." # keyless, picks a free provider
Add free keys for more providers (each unlocks more models and higher limits):
export GROQ_API_KEY=... # plus CEREBRAS_API_KEY, GEMINI_API_KEY, NVIDIA_API_KEY, ...
freellmpool providers # shows what's configured
freellmpool benchmark # times each provider so routing prefers the fastest
from freellmpool import Pool
reply = Pool.from_default_config().ask("Write a haiku about sqlite")
# or run a proxy and keep your existing OpenAI code unchanged:
# freellmpool proxy → OPENAI_BASE_URL=http://localhost:8080/v1
freellmpool orders the providers you have access to least-used-first (so load spreads across tiers),
tries them until one returns a non-empty result, and sets a rate-limited provider aside for a cooldown.
Daily counts reset at UTC midnight. Set FREELLMPOOL_ROUTING=fast to prefer the lowest-latency
provider instead.
Set each provider's free API key as an environment variable and let freellmpool pool them — it routes
each request to one you have capacity on and fails over automatically. You can also pin a provider per
call with -p.
No. Two providers are keyless so freellmpool works immediately; add keys for the others over time to grow your total free capacity.