I'm wroking on it full-time right now. It might be challenging, especially when it comes to interactions with video, audio, and image models. I'm just trying to stay on top of what's happening and add new things day by day.
An even bigger challenge might be integrating it tighter with the cloud platforms, Prometheus, OAuth, Datadog, vaults, and DLP software.
Benchmarking AI gateways properly is harder than it looks. Feature sets differ meaningfully - exact vs semantic caching, cluster mode, guardrails, audit logging - and each carries its own latency cost. What actually matters for most users is end-to-end latency including provider overhead (200–2000ms), and in that frame Bifrost, LiteLLM, and GoModel are all perfectly fine.
I ran some comparisons but I'm not happy with the methodology, and I'd rather not spread misleading information. Once I have time to do it properly I'll write it up and share a link here. Honestly, I'd also love to see benchmarks done by someone other than the AI gateway builders. :)
Where GoModel actually differs today:
- image size: 16.96 MB vs Bifrost's 69.84 MB. It matters for sidecar, edge, and cold-start scenarios.
- per-tenant keys, guardrails, and audit logs are all in the OSS repo - not gated.
- AI interaction visualization that makes debugging individual request/response flows much easier.
It's like fuel costs in a supply chain. When you buy apples at the store, you don't think about oil prices. But if trucks ran on something cheaper, more efficient, or less taxed, the apples on the shelf would be cheaper too.
"... and I don't see if I would be able to track usage from individual end-users through a header".
Currently we have a unified concept of User-Paths. Once you add a specific header OR assign User-Path to an API key, you can track the usage based on this. The User-Path might be youe end-user, internal user or some service. Examples:
Ah, seems like the right thing.
To be more clear on what I'm looking for is this: the system using the LLM gateway would present an arbitrary user id. Let's say the system has thousands of end-users (completely managed by that system and not configured in the LLM proxy). The admin is interested in blocking end-users from using more than a certain allowed quota.
First, GoModel is designed to be flexible. If you add an extra field, it tries to pass it through in the appropriate place (Postel's law)
Therefore there's a good chance that if they make a minor API-level change, GoModel will handle it without any code changes.
Also, changes to providers' API formats might be less and less frequent. Keeping up typically means adding a few lines of code per month. I'm usually aware of those changes because I use LLMs daily and follow the news in a few places.
As a fallback, GoModel includes a passthrough API that forwards your request to the provider in its original format. That might be useful when an AI provider changes their contract significantly and we haven't caught up yet.
Also, official SDKs aren't bug-free either. Skipping that extra layer and hitting the API directly might actually be beneficial for GoModel.
Yeah I share the same uncertainty here. My understanding is personal and interactive use should be fine. I use Conductor all day every day and it wraps a subscription.
Perhaps fully automated use is where the line is drawn.
But I also suspect individuals using it for light automated dispatching would be ok too.
An even bigger challenge might be integrating it tighter with the cloud platforms, Prometheus, OAuth, Datadog, vaults, and DLP software.
reply