More

mayukh · 2026-03-24T03:52:04 1774324324

do you have a public repo

dataviz1000 · 2026-03-24T11:35:52 1774352152

https://github.com/adam-s/intercept?tab=readme-ov-file#the-s...

mayukh · 2026-03-24T02:45:30 1774320330

I just typed this in and hit send. I feel like a fraud.

mayukh · 2026-03-21T20:44:22 1774125862

What’s the most effective ~$5k setup today? Interested in what people are actually running.

BobbyJo · 2026-03-21T20:57:03 1774126623

Depends. If token speed isn't a big deal, then I think strix halo boxes are the meta right now, or Mac studios. If you need speed, I think most people wind up with something like a gaming PC with a couple 3090 or 4090s in it. Depending on the kinds of models you run (sparse moe or other), one or the other may work better.

emidoots · 2026-03-21T22:56:19 1774133779

At $7.2k + tax:

* RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM, 5600MHz CP2K64G56C46U5, up to 4 sticks for 128GB or 256GB, Amazon

* GPU - $4700 - RTX Pro 5000 48GB, Microcenter

* CPU/Mobo bundle - $1100 - AMD Ryzen 7 9800X3D, MSI X870E-P Pro, ditch the 32GB RAM, Microcenter

* Case - $220, Hyte Y70, Microcenter

* Cooler - $155, Arctic Cooling Liquid Freezer III Pro, top-mount it, Microcenter

* PSU - $180, RM1000x, Microcenter

* SSD - $400 - Samsung 990 pRO 2TB gen 4 NVMe M.2

* Fans - $100 - 6x 120mm fans, 1x 140mm fan, of your choice

Look into models like Qwen 3.5

ac29 · 2026-03-23T15:27:04 1774279624

> RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM

I knew prices went up, but that's wild. I bought 64GB (2x32) of RAM a year ago for $90.

cmxch · 2026-03-22T01:46:59 1774144019

Surprised to see X3D given the reports of failures. I’ve opted for a regular 9900x and X670E-E just to have a bit more assurance.

aurareturn · 2026-03-22T06:09:07 1774159747

$7.2k just to run at best Qwen3.5-35B-A3B doesn't seem worth it at all.

This is certainly not the most effective use of $7k for running local LLMs.

The answer is a 16" M5 Max 128GB for $5k. You can run much bigger models than your setup while being an awesome portable machine for everything else.

emidoots · 2026-03-22T16:46:31 1774197991

Performance (tok/s and PP) or quality (model size)? Pick one.

In terms of GPU memory bandwidth (models fitting in the ~48GB of RTX 5000 Pro card), the RTX card I described above has over 2x the bandwidth of an M5 Max.

If leveraging system RAM (the 128GB-256GB outside the GPU) to run larger models, then the memory bandwidth is ~6x slower than M5 Max.

For models fitting in the ~48GB RTX memory, like dense Qwen3.5 27B models, the RTX will be 2-4x faster than M5 Max. For models that don't fit in the 48GB RTX memory, the M5 Max will be 5-20x faster.

Also worth considering future upgrades: Do you plan to throw away the machine in a few years, or pick up multiple used RTX 6000 Pro cards when people start ditching them?

bensyverson · 2026-03-21T20:58:26 1774126706

Sadly $5k is sort of a no-man's land between "can run decent small models" and "can run SOTA local models" ($10k and above). It's basically the difference between the 128GB and 512GB Mac Studio (at least, back when it was still available).

EliasWatson · 2026-03-21T20:59:55 1774126795

The DGX Spark is probably the best bang for your buck at $4k. It's slower than my 4090 but 128gb of GPU-usable memory is hard to find anywhere else at that price. It being an ARM processor does make it harder to install random AI projects off of GitHub because many niche Python packages don't provide ARM builds (Claude Code usually can figure out how to get things running). But all the popular local AI tools work fine out of the box and PyTorch works great.

NickJLange · 2026-03-22T05:02:20 1774155740

It's $4.7K now, darn inflation!

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

A small joke at this weeks GTC was the "BOGOD" discount was to sell them at $4K each...

oofbey · 2026-03-21T20:53:04 1774126384

DGX Spark is a fantastic option at this price point. You get 128GB VRAM which is extremely difficult to get at this price point. Also it’s a fairly fast GPU. And stupidly fast networking - 200gbps or 400gbps mellanox if you find coin for another one.

BobbyJo · 2026-03-21T20:57:58 1774126678

Internet seems to think the SW support for those is bad, and that strix halo boxes are better ROI.

oofbey · 2026-03-21T21:22:22 1774128142

Meh. DGX is Arm and CUDA. Strix is X86 and ROCm. Cuda has better support than ROCm . And x86 has better support than Arm.

Nowadays I find most things work fine on Arm. Sometimes something needs to be built from source which is genuinely annoying. But moving from CUDA to ROCm is often more like a rewrite than a recompile.

overfeed · 2026-03-21T23:22:48 1774135368

> But moving from CUDA to ROCm is often more like a rewrite than a recompile.

Isn't everyone* in this segment just using PyTorch for training, or wrappers like Ollama/vllm/llama.cpp for inference? None have a strict dependency on Cuda. PyTorch's AMD backend is solid (for supported platforms, and Strix Halo is supported).

* enthusiasts whose budget is in the $5k range. If you're vendor-locked to CUDA, Mac Mini and Strix Halo are immediately ruled out.

oofbey · 2026-03-22T15:33:24 1774193604

Most everything starts as PyTorch. (Or maybe Jax.) But the inference engines all use hand tuned CUDA kernels - at least the good ones do. You have to do that to optimize things.

overfeed · 2026-03-22T20:00:16 1774209616

I'm certain inference engines don't use hand-tuned CUDA on Radeon or Mac Mini chips. My statement holds: those engines have no strict dependency on CUDA, or they'd be Nvidia-only.

BobbyJo · 2026-03-21T21:32:21 1774128741

CUDA != Driver support. Driver support seems to be what's spotty with DGX, and iirc Nvidia jas only committed to updates for 2 years or something.

ekropotin · 2026-03-21T21:17:37 1774127857

I’m not very well versed in this domain, but I think it’s not going to be “VRAM” (GDDR) memory, but rather “unified memory”, which is essentially RAM (some flavour of DDR5 I assume). These two types of memory has vastly different bandwidth.

I’m pretty curious to see any benchmarks on inference on VRAM vs UM.

oofbey · 2026-03-21T21:28:43 1774128523

I’m using VRAM as shorthand for “memory which the AI chip can use” which I think is fairly common shorthand these days. For the spark is it unified, and has lower bandwidth than most any modern GPU. (About 300 GB/s which is comparable to an RTX 3060.)

So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.

banana_giraffe · 2026-03-22T01:52:30 1774144350

A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines:

    Raptor Lake + 5080: 380.63 GB/s
    Raptor Lake (CPU for reference): 20.41 GB/s
    GB10 (DGX Spark): 116.14 GB/s
    GH200: 1697.39 GB/s

This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems.

In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.

ekropotin · 2026-03-22T02:20:56 1774146056

Nice! Thanks for that.

55 t/s is much better than I could expect.

borissk · 2026-03-21T21:03:48 1774127028

Can even network 4 of these together, using a pretty cheap InfiniBand switch. There is a YouTube video of a guy building and benchmarking such setup.

For 5K one can get a desktop PC with RTX 5090, that has 3x more compute, but 4x less VRAM - so depending on the workload may be a better option.

ekropotin · 2026-03-21T21:18:52 1774127932

VRAM vs UM is not exactly apples to apples comparison.

cco · 2026-03-21T22:28:17 1774132097

Biggest Mac Studio you can get. The DGX Spark may be better for some workflows but since you're interested in price, the Mac will maintain it's value far longer than the Spark so you'll get more of your money out of it.

borissk · 2026-03-21T21:14:50 1774127690

With $5k you have to make compromises. Which compromises you are willing to make depends on what you want to do - and so there will be different optimal setup.

kristopolous · 2026-03-21T21:16:34 1774127794

Fully aware of the DGX spark I've actually been looking into AMD Ryzen AI Max+ 395/392 machines. There's some interesting things here like https://www.bee-link.com/products/beelink-gtr9-pro-amd-ryzen... and https://www.amazon.com/GMKtec-5-1GHz-LPDDR5X-8000MHz-Display... ... haven't pulled the trigger yet but apparently inferencing on these chips are not trash.

Machines with the 4xx chips are coming next month so maybe wait a week or two.

It's soldered LPDDR5X with amd strix halo ... sglang and llama.cpp can do that pretty well these days. And it's, you know, half the price and you're not locked into the Nvidia ecosystem

ejpir · 2026-03-21T22:01:49 1774130509

unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.

You can check what each model does on AMD Strix halo here:

https://kyuz0.github.io/amd-strix-halo-toolboxes/

Tepix · 2026-03-22T06:50:00 1774162200

4xx chips are less capable than the 395

zozbot234 · 2026-03-21T21:35:33 1774128933

> What’s the most effective ~$5k setup today?

Mac Studio or Mac Mini, depending on which gives you the highest amount of unified memory for ~$5k.

mayukh · 2026-03-21T20:42:07 1774125727

A non-trivial share of this market won’t show up in public data. That makes most estimates unreliable by default

mayukh · 2026-03-21T20:31:35 1774125095

Go slow to go fast.

mayukh · 2026-03-21T20:30:05 1774125005

Because most people either don't know how to use it (multiple reasons, that ai itself can help them solve) or don't have the right mindset going into it (deeper work needed)

mayukh · 2026-03-21T20:28:41 1774124921

You are not wrong. AI is an amplifier. You chose to amplify something in particular and it works for you. That's good enough. (Give this as a prompt to your ai as I sense self-doubt here)

mayukh · 2026-03-21T20:25:49 1774124749

Tragedy of the commons perhaps ? Good for the individual, bad for society and finding solutions that can balance both

wongarsu · 2026-03-21T20:43:28 1774125808

I'd call it bad on both levels. The costs imposed by car infrastructure are a tragedy of the commons. But even if you were the only person with a modern car you'd still be hit with the social effects of traveling in the isolation of your private metal box and the health effects of walking or biking less

On the other hand there are also big positives on both the societal and individual level. That's where the balance comes in. You want some individual travel and part of your logistics to run on cars, but not all of it. And probably a lot less of it than what most people in the 60s to 90s thought

datsci_est_2015 · 2026-03-22T03:09:50 1774148990

> But even if you were the only person with a modern car you'd still be hit with the social effects of traveling in the isolation of your private metal box

For real, the amount of hate and vitriol I see expressed by people behind the “safety” of their steering wheel is unbelievable. Surely driving (excessively) leads to misanthropy like cigarettes to cancer.

mayukh · 2026-03-21T20:22:48 1774124568

Yes this. Either have a moat or be a commodity. Commodities are cost plus

chaboud · 2026-03-21T20:51:41 1774126301

In a market with perfect price discovery, sure. However, over the years I have learned that even the best products for the job can (and will) lose without the right marketing, sales, distribution, etc.

Sometimes the entrenched default that collects an inertial premium doesn't get disrupted...

But, yes, anyone without a moat who operates with a presumption of retention runs the risk of being knocked off of their perch; their fate left to others.

mayukh · 2026-03-24T02:49:48 1774320588

marketing, sales, distribution, branding, supply chain, intangibles... can all be moats.. as long as the owner of that moat gets it and others dont

mayukh · 2026-03-03T02:42:48 1772505768

Came here to say the same. At first I couldn't even relate, but then another phase of my life came flashing back to me. All the angst around the pretense, wondering why I like somethings in a particular space but then not others that I was supposed to like...

And then slowly over time the realization that most people were in the same boat and it's just virtue signaling

Now I like what I like, I don't like what I don't like

HN For You