More

alexandercheema · 2026-03-30T23:32:18 1774913538

lol bro there is already a big OSS project called exo: https://github.com/exo-explore/exo

alexandercheema · 2026-02-04T21:27:20 1770240440

Isn't Claude Code for Infrastructure just...Claude Code?

aspectrr · 2026-02-04T21:38:04 1770241084

Hey, thanks for the comment. I answer this question in more depth on the website https://fluid.sh or this comment: https://news.ycombinator.com/reply?id=46889704&goto=item%3Fi...

This lets AI work on cloned production sandboxes vs running on production instances. Yes you can sandbox Claude Code on a production box, but it cannot test changes like it would for production-breaking changes. Sandboxes give AI this flexibility allowing it to safely test changes and reproduce things via IaC like Ansible playbooks.

alexandercheema · 2026-01-29T12:35:01 1769690101

Is there one for Kimi K2.5?

Topfi · 2026-01-29T12:49:12 1769690952

Yes, it's in the repo.

alexandercheema · 2025-10-17T16:40:22 1760719222

Appreciate you checking back so often. We have some exciting plans. Keep checking and it won't be long before something pops up :)

alexandercheema · 2025-10-17T16:39:49 1760719189

Yes, these models are mostly compute-bound so benefit even more from the compute on the DGX Spark.

alexandercheema · 2025-10-17T16:38:39 1760719119

Blog author here. Actually, no. The model can be streamed into the DGX Spark, so we can run prefill of models much larger than 128GB (e.g. DeepSeek R1) on the DGX Spark. This feature is coming to EXO 1.0 which will be open-sourced soonTM.

storus · 2025-10-17T20:49:50 1760734190

Excellent! Good luck!

alexandercheema · on Oct 3, 2024

exo maintainer here. tgtweak is correct.

This looks like potentially some promising research that I'm looking into reproducing now. We want to lower the barrier to running large models as much as possible so if this works, it would be a potential addition to the exo offering.

tgtweak · on Oct 4, 2024

Yeah combining these two would make a lot of sense, there is a big appetite to run larger models - even slower - on clustered hardware. This way you can add compute to speed up the token pace vs adding it just to run the model at all.

It is also possible some of these optimizations could help optimize distribution based on latency and bandwidth between nodes.

alexandercheema · on July 18, 2024

Not yet, should I make an issue for it?

rbanffy · on July 18, 2024

It'd be nothing but appropriate.

alexandercheema · on July 17, 2024

This is fixed now, with these commits: - https://github.com/exo-explore/exo/commit/dbbc7be57fb1871d2b... - https://github.com/exo-explore/exo/commit/ce46f000591d8d59c1...

Please keep the bug reports coming, we're moving fast to get this stable on all platforms.

alexandercheema · on July 17, 2024

Do you mean with Apple Intelligence? You can already query models you host from Apple using exo or even just local on-device inference.

gnicholas · on July 17, 2024

Does this work with Siri? I'm not running the beta so am not familiar with the features and limitations, but I thought that it was either answering based on on-device inference (using a closed model) or Apple's cloud (using a model you can't choose). My understanding is that you can ask OpenAI via an integration they've built, and that in the future you may be able to reach out to other hosted models. But I didn't see anything about being able to seamlessly reach out to your own locally-hosted models, either for Siri backup or anything else. But like I said, I'm not running the beta!

HN For You