For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | nnx's commentsregister

Looks really interesting but the Go binding sadly uses cgo. Could the binding be done in pure Go? Or at least purego (the cgo alternative using Go assembly for FFI) ?

I ended up just using the web version, which is actually better than the native app (multiple tabs works!).

> My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B.

This seems high. At which quantization? Using LM Studio or something else?

Note: Darkbloom seems to run everything on Q8 MLX.


Ah good point, this is using Q4, benchmarked total throughout serving with Llama.cpp.

The only limit is yourself!


*was


Are you `Ionstream` on OpenRouter?

If so, it would be great to provide more models through OpenRouter. This looks interesting but not enough to make me go through the trouble of setting up a separate account, funding it, etc.


second that.

for smaller start ups, it's easier to go through one provider (OpenRouter) instead of having the hassle of managing different endpoints and accounts. you might get access to many more users that way.

mid to large companies might want to go directly to the source (you) if they want to really optimize the last mile but even that is debatable for many.


Hey @nnx & @hazelnut, good question, but no, we're not IonStream on OpenRouter.

The purpose of IonRouter is to let people publicly see the speed of our engine firsthand. It makes the sales pipeline a lot easier when a prospect can just go try it themselves before committing. Signup is low friction ($10 minimum to load, and we preload $0.10) so you can test right away.

That said, we do plan to offer this as a usage-based service within our own cloud. We own every layer of the stack— inference engine, GPU orchestration, scheduling, routing, billing, all of it. No third-party inference runtime, no off-the-shelf serving framework. So there's no reason for us to go through a middleman.

No plans to be an OpenRouter provider right now.


in JS, signals and AbortController can replicate some of the functionality but it's far less ergonomic than Go.

https://github.com/ggoodman/context provides nice helpers that brings the DX a bit closer to Go.


Can you describe what is this slightly different approach and why it should work on all models?


This looks very interesting, but I wonder how's the rewrite approach gonna impact the long-term maintenance and porting changes _back_ from Tree Sitter.

As you mention WASM-readiness, did you consider using the official Tree Sitter WASM builds nicely packaged with wazero (pure Go WASM runtime) ?

It may help staying sync with upstream for the long term and, while probably a bit slower, has nice security and GC advantages too.


Why do you think it's "the wrong direction" ?


Hmm no, because in the case of purchasing alcohol the ID check is 1:1, in time and in space, it's ephemeral (unless the clerk has extreme photographic memory).

In the case of an online-based ID check, even with nice looking privacy terms, there is no guarantee that your ID won't be stored forever and/or re-analyzed many times cross-checking with other services, and worse leaked.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You