More

smusamashah · 2026-04-18T10:33:23 1776508403

Once a model is stable and good enough, for example Sonnet 4.6 or GPT 5.4 (or something else in future), it can be burned into hardware like Talaas chip reducing the cost many times and increasing the speed. At some point we can rely on old model while being productive with it.

535188B17C93743 · 2026-04-18T16:17:10 1776529030

I always wondered why the equivalent of integrated mining didn't apply to LLM inference... now it turns out it does and there's a company making it fast and robust!

bee_rider · 2026-04-18T18:58:05 1776538685

An ASIC for bitcoin mining makes more sense, in that the algorithm is basically “set.” For LLMs, it is hard while models are still developing.

But, sounds like Taalas is trying to strike an interesting balance where they can at least spin up ASICs for new models reasonably quickly with their modular design. It’s a really interesting bet, and might pay off.

radialstub · 2026-04-18T11:31:39 1776511899

No, burning models into hardware won't make them faster or reduce the cost. It will cost way more for similar performance as what you would get with a gpu. I am not telling you why, you can go figure that out on your own.

smusamashah · 2026-04-18T11:49:34 1776512974

But isn't this happening here https://taalas.com/ already. They have a demo of llama running at 17000 tokens per second https://chatjimmy.ai/

gjsman-1000 · 2026-04-18T12:07:51 1776514071

With some research, that chip appears like it would cost about $300-$400 to manufacture, die only.

For an 8B parameter model.

Opus is estimated at 500B-2T parameters. At that scale you’re past reticle limits and need HBM and multi-die packaging, which means you’ve essentially built an inference ASIC (like Groq or Etched) rather than something categorically cheaper than GPUs. The “burned into silicon” advantage mostly evaporates at frontier scale.

mixermachine · 2026-04-18T14:58:15 1776524295

The cutting edge, max size models will likely stay in the GPU space for a long time. But these models are not needed for most general requests. With a fine tuned 30B quantisized model you can serve a large portion of requests with around 32GB of RAM. Free users will likely only get these kinds of models.

At some point we will get these models in hardware and the cost per token will be minimal.

zozbot234 · 2026-04-18T15:33:34 1776526414

> With a fine tuned 30B quantisized model you can serve a large portion of requests with around 32GB of RAM. Free users will likely only get these kinds of models.

These are exactly the kinds of models that you can easily run locally by repurposing existing hardware. Depending on how much you're willing to wait for the answer, running local even gives you strictly better outcomes for simple Q&A queries.

(Long-context and agentic use cases are admittedly much harder to fit under that model, since non-AI uses for the high-end hardware you'd realistically need for those are rather more limited, and they're hit by the ongoing hardware shortage.)

mixermachine · 2026-04-19T15:13:53 1776611633

For programmers maybe. I do this too. But think about all the regular users out there. Your dad and your mum, maybe even your grandparents. This is a huge marked too and for that we can use these special chips at scale.

tomrod · 2026-04-18T12:56:33 1776516993

Does the cost scale linearly/superlinearly? What does the $300-$400 price data point tell us with relationship to the parameter density?

No gotchas here. I genuinely don't know that 8B parameters is in a zone with significant decreasing marginal returns -- too far out of my knowledge area but genuinely curious.

avidiax · 2026-04-18T13:15:46 1776518146

Die size increases cost exponentially, by decreasing chips per wafer and decreasing yield.

I expect that this kind of burned-in model is also very difficult to verify (how do you know if some of the weights are off), and not amenable to partial disablement to increase yield. For CPUs, you just laser disable bad cores. Can't forego part of a neural net.

robkop · 2026-04-19T12:37:39 1776602259

You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.

Obviously it’s not ideal but you could likely have single digit % of all weights affected and still have a useful model (many caveats here: e.g. locality of damaged weights matters, distribution of errors matters, fail high/low matters, …)

hdndjsbbs · 2026-04-18T19:30:31 1776540631

I mean, you probably can just turn off defective parts of the network. You better believe if this becomes popular they would salvage yields by selling "dumber" chips at a discount.

vrighter · 2026-04-19T04:50:05 1776574205

except that if you do, you've just implemented a different model, with no way to tell which part of it is wrong

robkop · 2026-04-19T12:58:38 1776603518

There’s a lot of tradeoffs to play with, those inference ASICs may not carry the gradient but they are still optimised for larger batches and to run any model. They need enough memory for the weights, wide batch inference, and ideally leftovers for kv cache efficiency.

For personal inference you’re given a lot more room to play in - much of it poorly explored today - enough to concern an argument of cost advantages evaporating

margalabargala · 2026-04-18T15:58:19 1776527899

You mean the person saying "I won't tell you why" might not know what they're talking about?! Say it ain't so.

pindab0ter · 2026-04-19T12:50:32 1776603032

I just tried chatjimmy.ai for a bit and while it is absolutely blazingly fast, it's also not a very strong model. I suppose that with time, stronger models will be able to run on such hardware, too.

smusamashah · 2026-04-17T22:41:48 1776465708

They are evil now and they will be nice when some goal is achieved, this does not make any sense.

smusamashah · 2026-04-17T22:38:04 1776465484

Oh wow! So they make dummy hospitals and put dummy meat bags of all sizes for camera time and social media post just to make Israel look bad when they hit those meat bags. That is some strategy.

LorenPechtel · 2026-04-19T01:02:39 1776560559

Nobody said they are dummy hospitals. They are dual use, some medical, some military HQ. And nobody said they were dummy meat bags. The most powerful weapon the terrorists have is dead civilians. And you get what you reward: punish Israel for dead civilians, you'll get more dead civilians.

smusamashah · 2026-04-17T08:07:14 1776413234

Opus 4.7 is a slight regression over 4.6 https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

Max is worse than High.

smusamashah · 2026-04-15T18:44:22 1776278662

It looks very good. I wish there was an interactive demo.

smusamashah · 2026-04-15T16:54:20 1776272060

How is this one better? I thought this was going to be a visual editor where you click and edit on the diagram itself. I don't seem to be able to do that here.

smusamashah · 2026-04-15T08:59:29 1776243569

There is a subreddit https://www.reddit.com/r/DeepIntoYouTube/ which brings up channels and videos with negligible number of views usually in range of 100 or less.

This video https://www.youtube.com/watch?v=3emFAf3jqQQ for example is from 15 years ago with 102 views right now.

smusamashah · 2026-04-14T10:01:59 1776160919

Article with better quality video here https://techxplore.com/news/2026-04-origami-robot-built-prin...

smusamashah · 2026-04-11T18:25:57 1775931957

Thank you very much for sharing this article. I have been having issues with my second monitor which is connected to my my laptop making it 3 screens. It was very annoying having to replug it to dock everytime it decided to turn off. I have also been feeling less productive for quite a while now.

After reading this, I have let the second one stay off and then unplugged and I can already notice the difference a lot. I didn't switch between apps much or procrastinated as much. It's only been a day or two and I have yet to see how I fare in long term. For now, I am happy.

smusamashah · 2026-04-10T17:29:26 1775842166

> I've replicated some of Jetbrains refactorings

How? Jetbrains in a Java code baes is amazing and very thorough on refactors. I can reliably rename, change signature, move things around etc.

girvo · 2026-04-12T12:14:05 1775996045

Typically by deeply misunderstanding what those refactoring tools actually do

BatteryMountain · 2026-04-14T07:41:23 1776152483

What is so magical about it? Most of them are pretty straight forward and core functionality easy to replicate in 30 minutes or less?

HN For You