More

santiago-pl · 2026-04-22T10:15:09 1776852909

I'm wroking on it full-time right now. It might be challenging, especially when it comes to interactions with video, audio, and image models. I'm just trying to stay on top of what's happening and add new things day by day.

An even bigger challenge might be integrating it tighter with the cloud platforms, Prometheus, OAuth, Datadog, vaults, and DLP software.

santiago-pl · 2026-04-22T10:02:05 1776852125

My thoughts about this:

Benchmarking AI gateways properly is harder than it looks. Feature sets differ meaningfully - exact vs semantic caching, cluster mode, guardrails, audit logging - and each carries its own latency cost. What actually matters for most users is end-to-end latency including provider overhead (200–2000ms), and in that frame Bifrost, LiteLLM, and GoModel are all perfectly fine.

I ran some comparisons but I'm not happy with the methodology, and I'd rather not spread misleading information. Once I have time to do it properly I'll write it up and share a link here. Honestly, I'd also love to see benchmarks done by someone other than the AI gateway builders. :)

Where GoModel actually differs today:

  - image size: 16.96 MB vs Bifrost's 69.84 MB. It matters for sidecar, edge, and cold-start scenarios.
  - per-tenant keys, guardrails, and audit logs are all in the OSS repo - not gated.
  - AI interaction visualization that makes debugging individual request/response flows much easier.

santiago-pl · 2026-04-22T08:58:12 1776848292

I've released a new version of GoModel (0.1.20) with explicit support for vllm. You can now use it even with a few vLLM instances. Like this:

  docker run --rm -p 8080:8080 \
    -e VLLM_BASE_URL=http://host.docker.internal:18000/v1 \
    -e VLLM_BASEMENT_BASE_URL=http://host.docker.internal:18000/v1 \
    enterpilot/gomodel:latest

santiago-pl · 2026-04-21T23:27:51 1776814071

TBH I decided to write GoModel because I needed something like this for my startup, enterpilot, and LiteLLM didn’t meet my needs.

santiago-pl · 2026-04-21T23:11:45 1776813105

It's like fuel costs in a supply chain. When you buy apples at the store, you don't think about oil prices. But if trucks ran on something cheaper, more efficient, or less taxed, the apples on the shelf would be cheaper too.

santiago-pl · 2026-04-21T20:38:47 1776803927

"... and I don't see if I would be able to track usage from individual end-users through a header".

Currently we have a unified concept of User-Paths. Once you add a specific header OR assign User-Path to an API key, you can track the usage based on this. The User-Path might be youe end-user, internal user or some service. Examples:

  /client1/app1
  /agents/agent1
  /team2/john
  /team2/adam

Would this work for you?

https://gomodel.enterpilot.io/docs/features/user-path

PS Thanks for the feedback on the Vault integration. Noted.

hgo · 2026-04-22T08:32:43 1776846763

Ah, seems like the right thing. To be more clear on what I'm looking for is this: the system using the LLM gateway would present an arbitrary user id. Let's say the system has thousands of end-users (completely managed by that system and not configured in the LLM proxy). The admin is interested in blocking end-users from using more than a certain allowed quota.

santiago-pl · 2026-04-21T20:22:35 1776802955

The LiteLLM SDK is intentionally on the website. You can "talk" to GoModel with it because both projects use an OpenAI-compatible API under the hood.

You can use it like this:

  from litellm import completion
  print(completion(
      model="openai/gpt-4.1-nano",
      api_base="http://localhost:8080/v1",
      api_key="your-gomodel-key",
      messages=[{"role": "user", "content": "hi"}],
  ).choices[0].message.content)

lackoftactics · 2026-04-21T20:55:31 1776804931

Thank you

santiago-pl · 2026-04-21T19:59:39 1776801579

Yes, the number of meaningful providers might be around 20-30.

santiago-pl · 2026-04-21T19:55:33 1776801333

First, GoModel is designed to be flexible. If you add an extra field, it tries to pass it through in the appropriate place (Postel's law)

Therefore there's a good chance that if they make a minor API-level change, GoModel will handle it without any code changes.

Also, changes to providers' API formats might be less and less frequent. Keeping up typically means adding a few lines of code per month. I'm usually aware of those changes because I use LLMs daily and follow the news in a few places.

As a fallback, GoModel includes a passthrough API that forwards your request to the provider in its original format. That might be useful when an AI provider changes their contract significantly and we haven't caught up yet.

Also, official SDKs aren't bug-free either. Skipping that extra layer and hitting the API directly might actually be beneficial for GoModel.

santiago-pl · 2026-04-21T18:16:06 1776795366

I'll take a closer look at it over the next few days.

However, it might be challenging, considering that Claude Code with a subscription no longer officially works with OpenClaw.

nzoschke · 2026-04-21T20:07:17 1776802037

Yeah I share the same uncertainty here. My understanding is personal and interactive use should be fine. I use Conductor all day every day and it wraps a subscription.

Perhaps fully automated use is where the line is drawn.

But I also suspect individuals using it for light automated dispatching would be ok too.

arcanemachiner · 2026-04-21T18:38:20 1776796700

That... might have changed?

https://news.ycombinator.com/item?id=47844269

lackoftactics · 2026-04-21T21:48:30 1776808110

It suppose to work again based on todays news

santiago-pl · 2026-04-21T23:41:23 1776814883

That's great news! The AI model ecosystem is changing so fast.

HN For You