freellmpool › guide

How to use multiple free LLM APIs together

To use several free LLM tiers at once — Groq, Cerebras, Gemini, NVIDIA, OpenRouter, Cloudflare and more — pool them behind one endpoint so each request goes to a provider you have capacity on and fails over when one is rate-limited. The open-source freellmpool does exactly this: one OpenAI-compatible interface over 24 providers, with per-day quota tracking so you drain each tier evenly. Several providers are keyless, so you can start with no signup.

Why combine them

Each free tier is small on its own; together they add up to real daily capacity.
When one provider returns a 429, you want to fall through to the next automatically — not fail.
You don't want to write and maintain glue code for each provider's SDK and limits.

Do it in two commands

pip install freellmpool
freellmpool ask "Explain the CAP theorem in one sentence."   # keyless, picks a free provider

Add free keys for more providers (each unlocks more models and higher limits):

export GROQ_API_KEY=...        # plus CEREBRAS_API_KEY, GEMINI_API_KEY, NVIDIA_API_KEY, ...
freellmpool providers          # shows what's configured
freellmpool benchmark          # times each provider so routing prefers the fastest

Use it from code or as a drop-in endpoint

from freellmpool import Pool
reply = Pool.from_default_config().ask("Write a haiku about sqlite")

# or run a proxy and keep your existing OpenAI code unchanged:
#   freellmpool proxy  →  OPENAI_BASE_URL=http://localhost:8080/v1

How routing works

freellmpool orders the providers you have access to least-used-first, then picks a least-used model inside that provider. This keeps large catalogs from getting extra traffic just because they expose more models. It tries candidates until one returns a non-empty result, and sets a rate-limited provider aside for a cooldown. Daily counts reset at UTC midnight. Set FREELLMPOOL_ROUTING=fast to prefer the lowest-latency provider, or FREELLMPOOL_ROUTING=legacy (or model-fast) to restore per-model balancing.

FAQ

How do I use Groq and Cerebras and Gemini together?

Set each provider's free API key as an environment variable and let freellmpool pool them — it routes each request to one you have capacity on and fails over automatically. You can also pin a provider per call with -p.

Do I need an API key for every provider?

No. Several providers are keyless so freellmpool works immediately; add keys for the others over time to grow your total free capacity.

Part of freellmpool (MIT, free, open source). Updated 2026-06-03.