More

tiku · 2026-04-09T20:33:56 1775766836

Yeah for dev purposes perhaps. Production would be another story.

tiku · 2026-04-09T17:04:34 1775754274

Im using z.ai when I hit my Claude limit after a few questions..drops in easily in Claude code.

tiku · 2026-04-06T14:13:37 1775484817

I hate that my M5 with 24 gb has so much trouble with these models. Not getting any good speeds, even with simple models.

tiku · 2026-04-06T14:10:53 1775484653

A lot of trial and error. I've built graphical tools with GD in PHP, the difficult part for me what that the coordinates where inverted.. I only knew how to draw lines and pixels, but I got the job done.

tiku · 2026-04-02T14:39:00 1775140740

I remember the LinkedIn app that got all your contacts from your phone and tried to add them to your network. I had random people from internet-deals (local craigslist) that where popping up. So strange that this was allowed.

tiku · 2026-03-22T08:57:57 1774169877

Thanks, I was already distracted after the first sentence, hoping there would be a good explanation.

tiku · 2026-03-14T21:52:31 1773525151

I still hate Claude for turning down limits. I use z.ai in Claude code now, haven't hit the limit yet.

tiku · 2026-03-13T06:32:46 1773383566

Perhaps time to dig out a channel to cut across the land there? To add some second path.

tiku · 2026-03-10T18:22:49 1773166969

Personally I'm so disappointed about the state of local AI. Only old models run "decent" but decent is way to slow to be usable.

sanchitmonga22 · 2026-03-11T02:02:29 1773194549

This is exactly the problem we're trying to solve. The models themselves have gotten surprisingly capable at small sizes, Qwen3.5 4B with 262K context, LFM2 1.2B for fast tool calling, but the inference infrastructure hasn't kept up.

When people say "local AI is too slow," they usually mean the engine is too slow, not the model. A 4B model at 186 tok/s (MetalRT on M4 Max) feels genuinely responsive for interactive chat. The same model at 87 tok/s (llama.cpp) feels sluggish. Same weights, same quality, 2x the speed, that's a usability cliff.

We think the gap between cloud and on-device inference is a infrastructure problem, not a model problem. That's what we're working on.

tiku · 2026-02-28T21:03:47 1772312627

Depends on your measurements. If you measure with 1 cm it is longer than if measure with 10 cm.

HN For You