The Absolute NIGHTMARE of Modern LLM APIs
Alright, let’s be real for a second. Take a look at your .env
file. GO ON, I’LL WAIT.
If you’re building anything serious with AI today, it’s a complete and utter mess. It’s a graveyard of API keys. You’ve got one for OpenAI, another for Anthropic, one for Groq because SPEED, one for Google’s Gemini, and probably a few for Mistral just in case. And let’s not forget your local setup with Ollama running Llama 3.1 because privacy is king and we love open source, right?.
It’s chaos. Absolute, unadulterated chaos.
Every single one of these providers has a slightly different API. A different request format. A different way of returning the response. You’re writing custom logic to parse this, custom logic to handle that. It’s a full-time job just managing the plumbing, and it’s holding you back from actually BUILDING COOL STUFF.
This isn’t innovation, my friends. This is just pain. And it SUCKS.
But what if you could just… not? What if you could write your code ONCE and talk to any model, from any provider, with a single, clean, unified interface? What if switching from GPT-4o to Claude Sonnet was just a one-line change?
SOUNDS LIKE A DREAM, RIGHT? Well, wake up. It’s real.
HOLD UP. Which “AnyLLM” Are We Talking About?!
Okay, before we go any further, we need to clear something up. The name “AnyLLM” is so good, like, three different teams decided to use it. If you google it, you’re gonna get confused. FAST. So let’s sort this out right now.
Clarification #1: We are NOT talking about AnythingLLM
You’ve probably seen this one. It’s from Mintplex Labs, it’s got a zillion stars on GitHub, and it’s AWESOME.
AnythingLLM
is a full-stack, open-source application that lets you build your own private, local ChatGPT. You can feed it your documents, PDFs, websites, whatever, and chat with them using any model you want. It’s a fantastic RAG (Retrieval-Augmented Generation) tool with a slick UI, agents, and multi-user support. But it’s a whole application. It’s not a developer library you import into your Python script. Different tool, different job.
Clarification #2: We are NOT talking about the other random repos.
There are a few other projects on GitHub with similar names, like a simple web UI or an async client. Cool projects, but not what we’re here for.
The Main Event: We ARE talking about any-llm from Mozilla AI
THIS is the one. The project you’ve been looking for.
any-llm
is a lightweight, no-nonsense Python library from the geniuses at Mozilla AI. Its one and only job is to solve the API mess we were just complaining about. It’s the unified interface. The single API to rule them all.
Getting this straight is step zero. The fact that the name is so overloaded is actually a perfect example of the chaos in the AI space right now. Everyone’s building, things are moving at light speed, and sometimes names collide. A good developer knows how to cut through that noise. By clarifying this upfront, we’re not just picking a tool; we’re establishing that we understand the messy reality of the ecosystem and we’re here to navigate it. Now, let’s get to the good stuff.
Enter any-llm
: Your New Best Friend from Mozilla AI
So, what exactly is any-llm
?
It’s a simple, no-BS, lightweight asynchronous Python library that gives you a single, unified interface to talk to all the most popular LLM providers. Think OpenAI, Anthropic, Google, Mistral, Groq, Ollama, AWS Bedrock the whole crew is here.
But it’s how it does it that makes it so brilliant. The Mozilla AI team made some big-brain architectural decisions that set it apart from the crowd.
The Core Philosophy: Why It’s Different
This isn’t just another wrapper. This is a philosophical statement about how developer tools should be built.
1. It Uses Official Provider SDKs (This is HUGE)
Instead of trying to re-implement the entire logic for talking to every single API from scratch, any-llm is smart. When a provider has an official Python SDK (like OpenAI or Mistral), any-llm uses it under the hood. Why is this a game-changer? STABILITY. It means
any-llm
isn’t responsible for handling the nitty-gritty of authentication, retries, and weird edge cases. The provider’s own team is. This massively reduces the chances of things breaking when a provider updates their API, because any-llm
just needs to update its dependency on the official SDK. It’s a genius move that offloads maintenance and boosts reliability.
2. NO PROXY SERVER REQUIRED
Let me say that again. NO. PROXY. SERVER. REQUIRED. You pip install it, and you’re done. Your code talks DIRECTLY to the provider’s API. There is no extra service you have to deploy, manage, monitor, or pay for. It’s a library, not another piece of infrastructure. This is a massive win for anyone building lean projects, working with serverless functions, or who just, you know, doesn’t want another thing to worry about at 3 AM.
3. Actively Maintained for a Reason
This isn’t some weekend project that’s going to be abandoned in a month. The Mozilla AI team built any-llm for their own flagship project, any-agent.13 They are dogfooding their own tool, which means they have a powerful incentive to keep it robust, up-to-date, and bug-free. When your own product depends on a library, you make sure that library is rock solid.
These choices show a deep understanding of what developers actually want: a tool that solves a specific, painful problem without getting in the way. In a world of bloated frameworks and complex platforms, any-llm
is a breath of fresh air. It’s a sharp knife, not a clunky multi-tool.
GET IN, DEVS, WE’RE CODING: The Hands-On Guide
Enough talk. Let’s build something. Fire up your terminal.
Step 1: Installation – The Right Way
First things first. You don’t just pip install any-llm-sdk
. That’s not how we roll. any-llm
is designed to be lightweight, so you only install the dependencies for the providers you actually need. THIS IS A FEATURE, NOT A BUG. It keeps your environment clean.
Here are the commands. Pick your poison.
# Just want to talk to OpenAI and your local Ollama?# PERFECT.pip install 'any-llm-sdk[openai,ollama]'# Building with Mistral and Anthropic's latest models?# EASY.pip install 'any-llm-sdk[mistral,anthropic]'# Feeling wild? Want to live on the edge?# Fine, install everything. But don't say I didn't warn you.pip install 'any-llm-sdk[all]'
This approach is just smart. It respects your project’s dependency tree and avoids unnecessary bloat.
Step 2: Your API Keys – Don’t Be a N00b
This should be obvious, but I’ll say it anyway. You need to set your API keys as environment variables. any-llm
looks for specific names, so use these exact formats:
export OPENAI_API_KEY="sk-..."export MISTRAL_API_KEY="..."export ANTHROPIC_API_KEY="..."export GROQ_API_KEY="..."# and so on...
Put ’em in your .bashrc
, .zshrc
, or use a .env
file with python-dotenv
. Just don’t hardcode them in your script. Please. For me.
Step 3: Your First Unified Call – OpenAI
Let’s start with the OG. Here’s a simple, clean, asynchronous script to hit gpt-4o-mini
. I’ve commented every line so you know exactly what’s happening.
import asyncioimport osfrom any_llm import completion# A simple check to make sure your key is actually set.# Don't skip this, it saves you from dumb errors.assert os.environ.get("OPENAI_API_KEY"), "OpenAI API key not found!"async def main(): print("Pinging OpenAI...") # This is it. The main event. The completion() function. response = await completion( provider="openai", model="gpt-4o-mini", messages= ) # The response object is a standard OpenAI ChatCompletion object. # Consistent. Predictable. Beautiful. print(response.choices.message.content)# Standard Python boilerplate to run our async function.if __name__ == "__main__": asyncio.run(main())
Run this. It just works. But this isn’t the magic part. THIS is the magic part…
Step 4: THE REAL MAGIC – One. Line. Change. 🔥
You ready? Watch closely. We’re going to take that exact same code and point it at a completely different provider.
Example 1: Switch to Mistral
All we have to do is change the provider and model strings. THAT’S IT.
#... same imports, same assert, same async main... print("Pinging Mistral...") response = await completion( provider="mistral", # <-- THE ONLY CHANGE model="mistral-small-latest", # <-- THE ONLY CHANGE messages= ) print(response.choices.message.content)#... same asyncio.run(main())...
Example 2: Switch to your LOCAL Ollama model
You have Llama 3.1 running locally with Ollama, right? Let’s talk to it. Again, ONE LINE CHANGE. (Okay, two strings on one line.)
#... same imports, NO assert needed for Ollama, same async main... print("Pinging local Ollama...") response = await completion( provider="ollama", # <-- THE ONLY CHANGE model="llama3.1", # <-- THE ONLY CHANGE messages= ) print(response.choices.message.content)#... same asyncio.run(main())...
This is the entire value proposition in a nutshell. No new SDKs to learn. No different response objects to parse. any-llm
takes care of all the translation behind the scenes and gives you back a clean, consistent, OpenAI-style ChatCompletion
object every single time.
THIS. IS. THE. WAY.
Level Up Your Game: Advanced any-llm
Tricks 🧠
Okay, you’ve mastered the basics. You can switch between providers like a pro. But any-llm
has a few more tricks up its sleeve.
Subsection 4.1: Streaming Like a Pro (Because Nobody Likes Waiting)
If your app makes a user stare at a loading spinner for 10 seconds, you’ve already lost. Streaming is non-negotiable for a good user experience. And with any-llm
, it’s ridiculously easy.
Just use the stream
function. It returns an async iterator that you can loop over to get chunks of text as soon as they’re generated.
Check out this clean, copy-paste-ready example:
import asyncioimport osfrom any_llm import streamassert os.environ.get("ANTHROPIC_API_KEY"), "Anthropic API key not found!"async def main(): print("Streaming from Anthropic...\n") # Use an 'async with' block for clean setup and teardown. async with stream( provider="anthropic", model="claude-3-5-sonnet-20240620", messages= ) as text_stream: # Just loop through the stream and print the chunks. async for chunk in text_stream: print(chunk, end="", flush=True) print("\n\n--- Stream complete! ---")if __name__ == "__main__": asyncio.run(main())
This is how modern AI apps should feel. Instantaneous. Responsive. It shows off the async-first design of the library perfectly.
Subsection 4.2: “What Models You Got?” Listing on the Fly
Ever wanted to let users pick their own model from a dropdown? Or maybe you just want to programmatically check if a new model you heard about on Twitter is available yet. any-llm
has you covered with the list_models()
function.
It’s a super handy utility for building dynamic applications or just exploring a provider’s offerings without leaving your terminal.
import asyncioimport osfrom any_llm import list_modelsassert os.environ.get("GROQ_API_KEY"), "Groq API key not found!"async def main(): print("Fetching available models from Groq:") try: models = await list_models(provider="groq") for model in models: # The 'model' object has useful attributes like 'id' print(f"- {model.id}") except Exception as e: print(f"Oops, something went wrong: {e}")if __name__ == "__main__": asyncio.run(main())
To make your life even easier, here’s a quick cheat sheet for the top providers and what they support through any-llm
. This saves you from having to dig through docs when you’re in the zone.
ID |
Env Var |
Source Code |
Responses |
Completion |
Streaming |
Reasoning |
Embedding |
List Models |
---|---|---|---|---|---|---|---|---|
|
ANTHROPIC_API_KEY |
❌ |
✅ |
✅ |
✅ |
❌ |
✅ |
|
|
AWS_BEARER_TOKEN_BEDROCK |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
|
|
AZURE_API_KEY |
❌ |
✅ |
✅ |
❌ |
✅ |
❌ |
|
|
CEREBRAS_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
CO_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
DATABRICKS_TOKEN |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
|
|
DEEPSEEK_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
FIREWORKS_API_KEY |
✅ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
GOOGLE_API_KEY/GEMINI_API_KEY |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
|
GROQ_API_KEY |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
|
|
HF_TOKEN |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
INCEPTION_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
LLAMA_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
LLAMA_API_KEY |
❌ |
✅ |
❌ |
❌ |
✅ |
✅ |
|
|
None |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
|
|
LM_STUDIO_API_KEY |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
|
MISTRAL_API_KEY |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
|
MOONSHOT_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
NEBIUS_API_KEY |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
|
|
None |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
|
OPENAI_API_KEY |
✅ |
✅ |
✅ |
❌ |
✅ |
✅ |
|
|
OPENROUTER_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
PORTKEY_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
SAMBANOVA_API_KEY |
❌ |
✅ |
✅ |
❌ |
✅ |
✅ |
|
|
TOGETHER_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
VOYAGE_API_KEY |
❌ |
❌ |
❌ |
❌ |
✅ |
❌ |
|
|
WATSONX_API_KEY |
❌ |
✅ |
✅ |
❌ |
❌ |
✅ |
|
|
XAI_API_KEY |
❌ |
✅ |
✅ |
Data based on official any-llm
documentation. “Reasoning Support” indicates support for tool-calling/function-calling features that enable reasoning.
Subsection 4.3: Unlocking “Thinking” Models (A Glimpse of the Future)
You’ve probably heard the buzz about “reasoning models.” These are the next evolution of LLMs. Instead of just spitting out an answer, they “think before they speak,” generating an internal chain of thought or a series of steps to solve more complex problems.
This is the key to building powerful AI agents. And any-llm
is already built for it. For providers that support OpenAI-style tool calling or function calling (like Groq, Mistral, and Anthropic), you can tap into these reasoning capabilities.
Here’s a conceptual example of what that looks like. You define a set of “tools” the model can use, and the model’s “reasoning” is exposed as a series of calls to those tools.
#... (imports and setup)...# This is a conceptual example. The exact tool definition will# follow the OpenAI format.tools = } } }]async def main(): response = await completion( provider="groq", model="gemma2-9b-it", # A model that's good at tool use messages=, tools=tools, tool_choice="auto" ) # The model's reasoning is exposed as a sequence of tool calls. if response.choices.message.tool_calls: print("Model's Reasoning Steps:") for tool_call in response.choices.message.tool_calls: print(f"- {tool_call.function.arguments}") else: print(response.choices.message.content)#... (run main)...
Even if you’re not building full-blown agents today, the fact that any-llm
has this built-in shows it’s a forward-looking library, ready for the next wave of AI development.
The Main Event: any-llm
vs. The Heavyweight, LiteLLM
Okay, let’s address the elephant in the room. If you’ve looked for a tool like this before, you’ve 100% come across LiteLLM
. It is, without a doubt, the current king of the hill. It’s powerful, it’s popular, and it’s integrated into major frameworks.
So, the big question is: can the new kid on the block, any-llm
, actually compete?
Let’s break it down, no punches pulled.
Why You Might LOVE LiteLLM
First, credit where it’s due. LiteLLM
is a beast for a reason.
- MASSIVE Provider Support: Seriously. They support over 100 LLM providers. If there’s some obscure, niche model you need to call from a provider you’ve barely heard of, chances are
LiteLLM
has an integration for it. Their coverage is unmatched. - Battle-Tested and Feature-Packed:
LiteLLM
has been in the trenches. It’s used by frameworks like CrewAI and has a ton of enterprise-grade features: cost tracking, automatic fallbacks and retries, caching, and a full-blown proxy server that can act as a centralized gateway for your entire organization.
The any-llm
Counter-Punch: A Different Philosophy
LiteLLM
is the “everything but the kitchen sink” solution. any-llm
is the minimalist’s choice. And that choice is built on two key philosophical differences.
1. The SDK vs. Re-implementation Debate
This is the core of the argument. LiteLLM re-implements the API logic for every provider from scratch.14 This gives them total control, but it’s a double-edged sword. As developers on Hacker News and Reddit have pointed out, this can lead to a messy and complex codebase.16 We’re talking a 7000+ line
utils.py
file and a 1200-line __init__.py
. This can lead to slow cold-start times (a killer for serverless) and makes it a nightmare to maintain. When OpenAI changes one little thing, the
LiteLLM
team has to scramble to patch their custom implementation.
any-llm
takes the opposite approach. By using the official provider SDKs, it bets on stability and maintainability. They let the provider’s own engineering team worry about the low-level details. This is a fundamentally safer and more robust long-term strategy.
2. The Proxy vs. No-Proxy Debate
LiteLLM’s most powerful feature is its proxy server. It’s a fantastic solution for teams that need a central hub to manage keys, monitor costs, and route traffic.
any-llm
is intentionally not a proxy. It’s just a library. This isn’t a missing feature; it’s the entire point. For a solo developer, a small team, or an application running on AWS Lambda, spinning up and managing a whole separate proxy service is massive overkill.
any-llm
gives you the power of a unified API without the architectural overhead.
The choice between them isn’t just about a feature list; it’s about the kind of developer experience you want. LiteLLM
offers a powerful, all-in-one platform, but with that comes complexity. any-llm
offers a simple, focused tool that does one thing exceptionally well. It trusts you, the developer, to build the other pieces (like logging or routing) if and when you need them.
FAQ: Stop Juggling LLM APIs (AnyLLM vs LiteLLM, local models, streaming, costs)
1) Is there a single API to talk to OpenAI, Anthropic, Google, Mistral, Groq, and my local models?
Yes, AnyLLM gives you one Python interface for multiple providers, and you can swap models/providers by changing a string (no rewrites).
2) Does AnyLLM need a proxy or gateway server?
No. It’s a library, not extra infra no proxy to deploy or babysit.
3) What’s the difference between AnyLLM and AnythingLLM?
AnythingLLM is a full-stack RAG/chat app (UI, agents, multi-user, document chat); AnyLLM is a developer SDK for unified API calls. Use the former to use an app; use the latter to build apps.
4) Which providers does AnyLLM support right now?
A big set including OpenAI, Anthropic, Google (Gemini), Mistral, Groq, DeepSeek, LM Studio, Ollama, Bedrock, Azure, OpenRouter, and more (with flags for streaming/embeddings/reasoning/list-models).
5) Does AnyLLM support streaming responses?
Yep, token streaming is built in, and the demo showcases real-time streaming + provider switching.
6) Can I list available models programmatically (for a dropdown or runtime checks)?
Yes list_models()
exposes what a provider offers so you can populate UIs or validate configs dynamically.
7) Does AnyLLM work with local models like Ollama or LM Studio?
Yes, call local models with the same interface you use for cloud providers (great for dev/test or privacy).
8) Why does AnyLLM feel more stable than wrappers I tried before?
It leans on official provider SDKs, so auth/retries/wire-level quirks are handled by the vendor SDKs instead of a giant custom shim. Less breakage when APIs change.
9) AnyLLM vs LiteLLM how should I choose?
Pick AnyLLM if you want a lean, async-first SDK (no proxy) that favors official SDKs; pick LiteLLM if you need a proxy gateway with cost tracking, routing, and 100+ provider coverage.
10) I’m drowning in API keys should I centralize behind a gateway?
For teams that need budgets/quotas/logging across apps, a LiteLLM proxy is handy; for solo projects/serverless, AnyLLM keeps it simple without extra moving parts.
11) Does AnyLLM support embeddings and the OpenAI-style Responses API?
Yes, check the provider matrix; many providers expose embeddings and the newer Responses API via AnyLLM.
12) Can I use “reasoning/tool-use” style features?
Where providers expose reasoning/tool-calling in OpenAI-style APIs, AnyLLM surfaces them (see the provider capability table).
13) Is there built-in cost tracking?
AnyLLM focuses on the client SDK; if you need org-wide spend tracking, budgets, or custom pricing maps, that’s exactly what LiteLLM’s proxy provides.
14) How do I install “just what I need” (to keep images small/cold-starts fast)?
Install with extras e.g., any-llm-sdk[openai,ollama]
or any-llm-sdk[mistral,anthropic]
so you only pull the SDKs you’ll actually use.
15) Are people really asking for unified LLM interfaces (or is this just hype)?
Dev threads are full of “best LLM gateway?”, “how do I switch providers easily?”, and “too many keys to manage!” the pain is real, which is why these tools exist.
The Final Verdict: Should You Switch?
Okay, let’s land the plane. No fence-sitting here. Here’s my straight-up advice.
You should absolutely try any-llm
if:
- You value stability and predictability. The “use official SDKs” philosophy is a winning long-term bet.
- You’re building a lean project, a simple script, or a serverless function and want to keep your dependencies and architectural complexity to an absolute minimum.
- You believe in the “library, not a service” approach and don’t want the hassle of managing a separate proxy server.
- You’re starting a new project and want a clean, modern, async-first foundation that’s backed by a major player like Mozilla.
You should probably stick with LiteLLM
(for now) if:
- You absolutely need to support a niche LLM provider that
any-llm
doesn’t have yet.LiteLLM
‘s breadth is its killer feature. - Your team or company requires the advanced features of a centralized proxy server, like complex routing, fallbacks, and detailed cost tracking across multiple departments.
- You’re already deeply integrated with it in a large, complex application, and it’s working fine. (If it ain’t broke, don’t fix it).
The LLM tooling space is moving at an insane pace, and that’s a good thing. Competition breeds innovation. any-llm
from Mozilla AI is more than just another library; it’s a strong, opinionated contender with a smart philosophy that prioritizes developer experience and long-term stability.
Give it a shot on your next project. pip install
is cheap. What do you have to lose?
Drop a comment below and let me know what you think. Let’s build the future. 🚀