More

nnx · 2026-04-19T13:39:08 1776605948

Looks really interesting but the Go binding sadly uses cgo. Could the binding be done in pure Go? Or at least purego (the cgo alternative using Go assembly for FFI) ?

nnx · 2026-04-16T06:30:36 1776321036

I ended up just using the web version, which is actually better than the native app (multiple tabs works!).

nnx · 2026-04-16T06:10:48 1776319848

> My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B.

This seems high. At which quantization? Using LM Studio or something else?

Note: Darkbloom seems to run everything on Q8 MLX.

pants2 · 2026-04-16T10:43:22 1776336202

Ah good point, this is using Q4, benchmarked total throughout serving with Llama.cpp.

nnx · 2026-04-02T06:49:59 1775112599

The only limit is yourself!

modriano · 2026-04-02T11:54:09 1775130849

nnx · 2026-03-13T02:11:59 1773367919

Are you `Ionstream` on OpenRouter?

If so, it would be great to provide more models through OpenRouter. This looks interesting but not enough to make me go through the trouble of setting up a separate account, funding it, etc.

hazelnut · 2026-03-13T03:41:58 1773373318

second that.

for smaller start ups, it's easier to go through one provider (OpenRouter) instead of having the hassle of managing different endpoints and accounts. you might get access to many more users that way.

mid to large companies might want to go directly to the source (you) if they want to really optimize the last mile but even that is debatable for many.

vshah1016 · 2026-03-13T18:39:17 1773427157

Hey @nnx & @hazelnut, good question, but no, we're not IonStream on OpenRouter.

The purpose of IonRouter is to let people publicly see the speed of our engine firsthand. It makes the sales pipeline a lot easier when a prospect can just go try it themselves before committing. Signup is low friction ($10 minimum to load, and we preload $0.10) so you can test right away.

That said, we do plan to offer this as a usage-based service within our own cloud. We own every layer of the stack— inference engine, GPU orchestration, scheduling, routing, billing, all of it. No third-party inference runtime, no off-the-shelf serving framework. So there's no reason for us to go through a middleman.

No plans to be an OpenRouter provider right now.

nnx · 2026-03-07T01:44:59 1772847899

in JS, signals and AbortController can replicate some of the functionality but it's far less ergonomic than Go.

https://github.com/ggoodman/context provides nice helpers that brings the DX a bit closer to Go.

nnx · 2026-02-28T13:19:34 1772284774

Can you describe what is this slightly different approach and why it should work on all models?

nnx · 2026-02-26T02:46:42 1772074002

This looks very interesting, but I wonder how's the rewrite approach gonna impact the long-term maintenance and porting changes _back_ from Tree Sitter.

As you mention WASM-readiness, did you consider using the official Tree Sitter WASM builds nicely packaged with wazero (pure Go WASM runtime) ?

It may help staying sync with upstream for the long term and, while probably a bit slower, has nice security and GC advantages too.

nnx · 2026-02-16T02:56:46 1771210606

Why do you think it's "the wrong direction" ?

nnx · 2026-02-13T00:36:07 1770942967

Hmm no, because in the case of purchasing alcohol the ID check is 1:1, in time and in space, it's ephemeral (unless the clerk has extreme photographic memory).

In the case of an online-based ID check, even with nice looking privacy terms, there is no guarantee that your ID won't be stored forever and/or re-analyzed many times cross-checking with other services, and worse leaked.

HN For You