For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | santiago-pl's commentsregister

I'm wroking on it full-time right now. It might be challenging, especially when it comes to interactions with video, audio, and image models. I'm just trying to stay on top of what's happening and add new things day by day.

An even bigger challenge might be integrating it tighter with the cloud platforms, Prometheus, OAuth, Datadog, vaults, and DLP software.


My thoughts about this:

Benchmarking AI gateways properly is harder than it looks. Feature sets differ meaningfully - exact vs semantic caching, cluster mode, guardrails, audit logging - and each carries its own latency cost. What actually matters for most users is end-to-end latency including provider overhead (200–2000ms), and in that frame Bifrost, LiteLLM, and GoModel are all perfectly fine.

I ran some comparisons but I'm not happy with the methodology, and I'd rather not spread misleading information. Once I have time to do it properly I'll write it up and share a link here. Honestly, I'd also love to see benchmarks done by someone other than the AI gateway builders. :)

Where GoModel actually differs today:

  - image size: 16.96 MB vs Bifrost's 69.84 MB. It matters for sidecar, edge, and cold-start scenarios.
  - per-tenant keys, guardrails, and audit logs are all in the OSS repo - not gated.
  - AI interaction visualization that makes debugging individual request/response flows much easier.

I've released a new version of GoModel (0.1.20) with explicit support for vllm. You can now use it even with a few vLLM instances. Like this:

  docker run --rm -p 8080:8080 \
    -e VLLM_BASE_URL=http://host.docker.internal:18000/v1 \
    -e VLLM_BASEMENT_BASE_URL=http://host.docker.internal:18000/v1 \
    enterpilot/gomodel:latest

TBH I decided to write GoModel because I needed something like this for my startup, enterpilot, and LiteLLM didn’t meet my needs.

It's like fuel costs in a supply chain. When you buy apples at the store, you don't think about oil prices. But if trucks ran on something cheaper, more efficient, or less taxed, the apples on the shelf would be cheaper too.

"... and I don't see if I would be able to track usage from individual end-users through a header".

Currently we have a unified concept of User-Paths. Once you add a specific header OR assign User-Path to an API key, you can track the usage based on this. The User-Path might be youe end-user, internal user or some service. Examples:

  /client1/app1
  /agents/agent1
  /team2/john
  /team2/adam
Would this work for you?

https://gomodel.enterpilot.io/docs/features/user-path

PS Thanks for the feedback on the Vault integration. Noted.


Ah, seems like the right thing. To be more clear on what I'm looking for is this: the system using the LLM gateway would present an arbitrary user id. Let's say the system has thousands of end-users (completely managed by that system and not configured in the LLM proxy). The admin is interested in blocking end-users from using more than a certain allowed quota.

The LiteLLM SDK is intentionally on the website. You can "talk" to GoModel with it because both projects use an OpenAI-compatible API under the hood.

You can use it like this:

  from litellm import completion
  print(completion(
      model="openai/gpt-4.1-nano",
      api_base="http://localhost:8080/v1",
      api_key="your-gomodel-key",
      messages=[{"role": "user", "content": "hi"}],
  ).choices[0].message.content)

Thank you

Yes, the number of meaningful providers might be around 20-30.

First, GoModel is designed to be flexible. If you add an extra field, it tries to pass it through in the appropriate place (Postel's law)

Therefore there's a good chance that if they make a minor API-level change, GoModel will handle it without any code changes.

Also, changes to providers' API formats might be less and less frequent. Keeping up typically means adding a few lines of code per month. I'm usually aware of those changes because I use LLMs daily and follow the news in a few places.

As a fallback, GoModel includes a passthrough API that forwards your request to the provider in its original format. That might be useful when an AI provider changes their contract significantly and we haven't caught up yet.

Also, official SDKs aren't bug-free either. Skipping that extra layer and hitting the API directly might actually be beneficial for GoModel.


I'll take a closer look at it over the next few days.

However, it might be challenging, considering that Claude Code with a subscription no longer officially works with OpenClaw.


Yeah I share the same uncertainty here. My understanding is personal and interactive use should be fine. I use Conductor all day every day and it wraps a subscription.

Perhaps fully automated use is where the line is drawn.

But I also suspect individuals using it for light automated dispatching would be ok too.



It suppose to work again based on todays news

That's great news! The AI model ecosystem is changing so fast.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You