I tried to use SRIOV to virtualize mellanox nics with vlans on redhat Linux. Long story short it did not work. Per Nvidia the os has to also run open switch. This work was on an already complex setup in finance ... so adding open switch was considered too much additionally complexity. This requirement is not something I run across in the docs.
The situation in networking is a lot different than graphics. I don't know much other than that it depends on what specific protocol, card, firmware, and network topology you're using and there's not really generic advice. If the question is setting up Ethernet switching inside the card so VFs can talk to the network, then I think the Linux switchdev tools can configure that on their own without Open vSwitch but you probably need to find someone who understands your specific type of deployment for better advice.
Depending what you're doing AMD's support for VirtIO Native Context might be a useful alternative (I think it gives less isolation which could be good or bad depending on use).
838 seems to be the real INT8 TOPS number for the 5090; going from 800 to 3400 takes an x2 speedup for sparsity (so skipping ops) and another x2 speedup for FP4 over INT8.
So it's closer to half the speed than a tenth. Intel also seems to be positioning this card against the RTX PRO 4000 Blackwell, not the 5090, and that one gets more like 300 INT8 TOPS. It also has less memory but at a slightly higher bandwidth. The 5090 is much faster and IIRC priced similarly to the PRO 4000, but is also decidedly a consumer product which, especially for Nvidia, comes with limitations (e.g. no server-friendly form factor cards available, and there are or used to be driver license restrictions that prevented using a consumer card in a data center setup).
Thank you for the correction. That seemed way too lopsided to be believed. This assessment balances the memory to tops ratio much much more evenly, which is to be expected! I was low key hoping someone would help me make sense of how wildly disparate figures were, but I wasn't seeing.
AMD R9700 is 378/766 tops int8 dense/sparse. 644GB/s of 32GB memory. ~$1400. To throw one more card into the mix. Intel undercutting that nicely here.
You're right that for companies, the pro grade matters. For us mere mortals, much less so. Features like sr-iov however are just fantastic so see! Good job Intel. AMD has been trickling out such capabilities for a decade (cards fused for "MxGPU" capability) & it makes it such an easier buy to just offer it straight up across the models.
especially for exploratory work 1/10th the perf is fine. Intel isn't able to compete head to head with Nvidia (yet), but vram is capability while speed is capacity. There will be plenty of use cases where the value prop here makes sense.
If you stick with your OS/package manager-distributed version, installation isn't painful anymore (provided that version approximately overlaps with your generation of GPU). It's okay for inference, and okay for training if you don't stray too far beyond plain torch. If you want to run code from a paper or other more esoteric stuff you're still going to have a bad time.
The product would be excellent in 2024, but now it's a landfill filler. You can run some small models at pedestrian speed, novelty wears off and that's it.
Intel is not looking in the future. If they released Arc Pro B70 with 512GB base RAM, now that could be interesting.
> A social networking system simulates a user using a language model trained using training data generated from user interactions performed by that user
What am I missing here? I thought this model needs 46GB of unified memory for 4-bit quant. Radeon RX 7900 XTX has 24GB of memory right? Hoping to get some insight, thanks in advance!
MoEs can be efficiently split between dense weights (attention/KV/etc) and sparse (MoE) weights. By running the dense weights on the GPU and offloading the sparse weights to slower CPU RAM, you can still get surprisingly decent performance out of a lot of MoEs.
Not as good as running the entire thing on the GPU, of course.
Thanks to you I decided to give it a go as well (didn't think I'd be able to run it on 7900xtx) and I must say it's awesome for a local model. More than capable for more straightforward stuff. It uses full VRAM and about 60GBs of RAM, but runs at about 10tok/s and is *very* usable.
reply