For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | alexandercheema's commentsregister

lol bro there is already a big OSS project called exo: https://github.com/exo-explore/exo


Isn't Claude Code for Infrastructure just...Claude Code?


Hey, thanks for the comment. I answer this question in more depth on the website https://fluid.sh or this comment: https://news.ycombinator.com/reply?id=46889704&goto=item%3Fi...

This lets AI work on cloned production sandboxes vs running on production instances. Yes you can sandbox Claude Code on a production box, but it cannot test changes like it would for production-breaking changes. Sandboxes give AI this flexibility allowing it to safely test changes and reproduce things via IaC like Ansible playbooks.


Is there one for Kimi K2.5?


Yes, it's in the repo.


Appreciate you checking back so often. We have some exciting plans. Keep checking and it won't be long before something pops up :)


Yes, these models are mostly compute-bound so benefit even more from the compute on the DGX Spark.


Blog author here. Actually, no. The model can be streamed into the DGX Spark, so we can run prefill of models much larger than 128GB (e.g. DeepSeek R1) on the DGX Spark. This feature is coming to EXO 1.0 which will be open-sourced soonTM.


Excellent! Good luck!


exo maintainer here. tgtweak is correct.

This looks like potentially some promising research that I'm looking into reproducing now. We want to lower the barrier to running large models as much as possible so if this works, it would be a potential addition to the exo offering.


Yeah combining these two would make a lot of sense, there is a big appetite to run larger models - even slower - on clustered hardware. This way you can add compute to speed up the token pace vs adding it just to run the model at all.

It is also possible some of these optimizations could help optimize distribution based on latency and bandwidth between nodes.


Not yet, should I make an issue for it?


It'd be nothing but appropriate.


This is fixed now, with these commits: - https://github.com/exo-explore/exo/commit/dbbc7be57fb1871d2b... - https://github.com/exo-explore/exo/commit/ce46f000591d8d59c1...

Please keep the bug reports coming, we're moving fast to get this stable on all platforms.


Do you mean with Apple Intelligence? You can already query models you host from Apple using exo or even just local on-device inference.


Does this work with Siri? I'm not running the beta so am not familiar with the features and limitations, but I thought that it was either answering based on on-device inference (using a closed model) or Apple's cloud (using a model you can't choose). My understanding is that you can ask OpenAI via an integration they've built, and that in the future you may be able to reach out to other hosted models. But I didn't see anything about being able to seamlessly reach out to your own locally-hosted models, either for Siri backup or anything else. But like I said, I'm not running the beta!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You