Oh also good talk at PTC yesterday! I had meant to ask you more about the formal memory model, but the other post talk questions ended up being really interesting too.
Oh really? I can't find anything about the memory model online. I'm not sure what's the best way to do this, but if there's a way for us to get in contact, I'd be interested in adjusting the project so it's developed in the most ergonomic way possible. I'm chatting with a couple of universities and I might issue a research grant for this project to be further fleshed out, so would be keen to hear your insights prior to kicking this off. My email is neel[at]berkeley.edu.
There was some solid commentary on the Ps5Pro tech talk stating core rendering is so well optimized much of the gains in the future will come from hardware process technology improvements not from radical architecture changes. It seems clear the future of rendering is likely to be a world where the gains come from things like dlss and less and free lunch savings due to easy optimizations.
Raster is believe it or not, not quite the bottleneck. Raster speed definitely _matters_, but it's pretty fast even in software, and the bigger bottleneck is just overall complexity. Nanite is a big pipeline with a lot of different passes, which means lots of dispatches and memory accesses. Same with material shading/resolve after the visbuffer is rendered.
EDIT: The _other_ huge issue with Nanite is overdraw with thin/aggregate geo that 2pass occlusion culling fails to handle well. That's why trees and such perform poorly in Nanite (compared to how good Nanite is for solid opaque geo). There's exciting recent research in this area though! https://mangosister.github.io/scene_agn_site.
Is there no hope for AMD anymore? After George Hotz/Tinygrad gave up on AMD I feel there’s no realistic chance of using their chips to break the CUDA dominance.
Maybe from Modular (the company Chris Lattner is working for). In this recent announcement they said they had achieved competitive ML performance… on NVIDIA GPUs, but with their own custom stack completely replacing CUDA. And they’re targeting AMD next.
IMO the hope shouldn't be that AMD specifically wins, rather it's best for consumers that hardware becomes commoditized and prices come down.
And that's what's happening, slowly anyway. Google, Apple and Amazon all have their own AI chips, Intel has Gaudi, AMD had their thing, and the software is at least working on more than just Nvidia. Which is a win. Even if it's not perfect. I'm personally hoping that everyone piles in on a standard like SYCL.
tl;dr there's a non-unsubstantial # of people who learn a lot from geohot. I'd say about 3% of people here will be confused if you thought of him as less than a top technical expert across many comp sci fields.
And he did the geohot thing recently, way tl;dr: acted like there was a scandal being covered up by AMD around drivers that was causing them to "lose" to nVidia.
He then framed AMD not engaging with him on this topic as further covering-up and choosing to lose.
So if you're of a certain set of experiences, you see an anodyne quote from the CEO that would have been utterly unsurprising dating back to when ATI was still a company, and you'd read it as the CEO breezily admitting in public that geohot was right about how there was malfeasance, followed by a cover up, implying extreme dereliction of duty, because she either helped or didn't realize till now.
I'd argue this is partially due to stonk-ification of discussions, there was a vague, yet often communicated, sense there was something illegal happening. Idea was it was financial dereliction of duty to shareholders.
Like Matt Levine says, “everything is securities fraud”. Company gets hacked? Securities fraud because they failed to disclose the exact probability of this event in their SEC filings. Company’s latest product is a flop? Securities fraud because they failed to disclose the bad decisions leading to the flop. Etc, etc.
Quite frankly, I have difficulty reconciling a lot of comments here with that, and my own experience as an AMD GPU user (although not for compute, and not on Windows).
In CPUs, AMD has made many innovations that have been copied by Intel only after many years and this delay had an important contribution to Intel's downfall.
The most important has been the fact that AMD has predicted correctly that big monolithic CPUs will no longer be feasible in the future CMOS fabrication technologies, so they have designed the Zen family since the beginning with a chiplet-based architecture. Intel had attempted to ridicule them, but after losing many billions they have been forced to copy this strategy.
Also in the microarchitecture of their CPUs AMD has made the right choices since the beginning and then they have improved it constantly with each generation. The result is that now the latest Intel big core, Lion Cove, has a microarchitecture that is much more similar to AMD Zen 5 than to any of the previous Intel cores, because they had to do this to get a competitive core.
In the distant past, AMD has also introduced a lot of innovations long before they were copied by Intel, but it is true that those had not been invented by AMD, but they had been copied by AMD from more expensive CPUs, like DEC Alpha or Cray or IBM POWER, but Intel has also copied them only after being forced by the competition with AMD.
Everything is comparative. AMD isn't perfect. As an Ex Shareholder I have argued they did well partly because of Intel's downfall. In terms of execution it is far from perfect.
But Nvidia is a different beast. It is a bit like Apple in the late 00s where you take business, forecast, marketing, operation, software, hardware, sales etc You take any part of it and they are all industry leading. And having industry leading capability is only part of the game, having it all work together is completely another thing. And unlike Apple where they lost direction once Steve Jobs passed away and weren't sure about how to deploy capital. Jensen is still here, and they have more resources now making Nvidia even more competitive.
It is often most people underestimate the magnitude of the task required, ( I like to tell the story again about an Intel GPU engineer in 2016 arguing they could take dGPU market shares by 2020, and we are now 2025 ), over estimate the capability of an organisation, under estimate the rival's speed of innovation and execution. These three thing combined is why most people are often off the estimate by an order of magnitude.
We are in the middle of a monopoly squeeze by NVidia on the most innovative part of the economy right now. I expect the DOJ to hit them harder than they did MS in the 90s given the bullshit they are pulling and the drag on the economy they are causing.
By comparison if AMD could write a driver that didn't shit itself when it had to multiply more than two matrices in a row they'd be selling cards faster than they can make them. You don't need to sell the best shovels in a gold rush to make mountains of money, but you can't sell teaspoons as premium shovels and expect people to come back.
They... do have a monopoly on foundry capacity, especially if you're looking at the most advanced nodes? Nobody's going to Intel or Samsung to build 3nm processors. Hell, there have been whispers over the past month that even Samsung might start outsourcing Exynos to TSMC; Intel already did that with Lunar Lake.
Having a monopoly doesn't mean that you are engaging in anticompetitive behavior, just that you are the only real option in town.
This gets at the classic problem in defining a monopoly: how hou define the market. Every company is a monopoly if you define the market narrowly enough. Ford has a monopoly on F150’s.
I would argue that defining a semiconductor market in terms of node size is too narrow. Just because TSMC is getting the newest nodes first does not mean they have a monopoly in the semiconductor market. We can play semantics, but for any meaningful discussion of monopolistic behaviors, a temporary technical advantage seems a poor way to define the term.
> Just because TSMC is getting the newest nodes first does not mean they have a monopoly in the semiconductor market.
Sure. Market research also places them as having somewhere around 65% of worldwide foundry sales [0], with Samsung coming in second place with about 12% (mostly first-party production). Fact is that nobody else comes close to providing real competition for TSMC, so they can charge whatever prices they want, whether you're talking about the 3nm node or the 10nm node.
Rounding out the top five... SMIC (6%) is out of the question unless you're based in China due to various sanctions, UMC (5%) mainly sell decade+-old processes (22nm and larger), and Global Foundries explicitly has abandoned keeping up with the latest technologies.
If you exclude the various Chinese foundries and subtract off Samsung's first-party development, TSMC's share of available foundry capacity for third-party contracts likely grows to 70% or more. At what point do you consider this to be a monopoly? Microsoft Windows has about 72% of desktop OS share.
What effect did the DOJ have on MS in the 90s? Didn't all of that get rolled back before they had to pay a dime, and all it amounted to was that browser choice screen that was around for a while? Hardly a crippling blow. If anything that showed the weakness of regulators in fights against big tech, just outlast them and you're fine.
>I expect the DOJ to hit them harder than they did MS in the 90s given the bullshit they are pulling and the drag on the economy they are causing.
It sounds like you're expecting extreme competence from the DOJ. Given their history with regulating big tech companies, and even worse, the incoming administration, I think this is a very unrealistic expectation.
Also I'd take HN as being being an amazing platform for the overall consistency and quality of moderation. Anything beyond that depends more on who you're talking to than where at.
Oh, there's basically no chance of getting that on the Internet.
The Internet is a machine that highly simplifies the otherwise complex technical challenge of wide-casting ignorance. It wide-casts wisdom too, but it's an exercise for the reader to distinguish them.
Everyone whose dug deep into what AMD is doing has left in disgust if they are lucky and bankruptcy if they are not.
If I can save someone else from wasting $100,000 on hardware and six months of their life then my post has done more good than the AMD marketing department ever will.
> If I can save someone else from wasting $100,000 on hardware and six months of their life then my post has done more good than the AMD marketing department ever will.
This seems like unuseful advice if you've already given up on them.
You tried it and at some point in the past it wasn't ready. But by not being ready they're losing money, so they have a direct incentive to fix it. Which would take a certain amount of time, but once you've given up you no longer know if they've done it yet or not, at which point your advice would be stale.
Meanwhile the people who attempt it apparently seem to get acquired by Nvidia, for some strange reason. Which implies it should be a worthwhile thing to do. If they've fixed it by now which you wouldn't know if you've stopped looking, or they fix it in the near future, you have a competitive advantage because you have access to lower cost GPUs than your rivals. If not, but you've demonstrated a serious attempt to fix it for everyone yourself, Nvidia comes to you with a sack full of money to make sure you don't finish, and then you get a sack full of money. That's win/win, so rather than nobody doing it, it seems like everybody should be doing it.
I've seen people try it every six months for two decades now.
At some point you just have to accept that AMD is not a serious company, but is a second rate copycat and there is no way to change that without firing everyone from middle management up.
I'm deeply worried about stagnation in the CPU space now that they are top dog and Intel is dead in the water.
Here's hoping China and Risk V save us.
>Meanwhile the people who attempt it apparently seem to get acquired by Nvidia
Everyone I've seen base jumping has gotten a sponsorship from redbull, ergo. everyone should basejump.
> At some point you just have to accept that AMD is not a serious company, but is a second rate copycat and there is no way to change that without firing everyone from middle management up.
AMD has always punched above their weight. Historically their problem was that they were the much smaller company and under heavy resource constraints.
Around the turn of the century the Athlon was faster than the Pentium III and then they made x86 64-bit when Intel was trying to screw everyone with Itanic. But the Pentium 4 was a marketing-optimized design that maximized clock speed at the expense of heat and performance per clock. Intel was outselling them even though the Athlon 64 was at least as good if not better. The Pentium 4 was rubbish for laptops because of the heat problems, so Intel eventually had to design a separate chip for that, but they also had the resources to do it.
That was the point that AMD made their biggest mistake. When they set out to design their next chip the competition was the Pentium 4, so they made a power-hungry monster designed to hit high clock speeds at the expense of performance per clock. But the reason more people didn't buy the Athlon 64 wasn't that they couldn't figure out that a 2.4GHz CPU could be faster than a 2.8GHz CPU, it was all the anti-competitive shenanigans Intel was doing behind closed doors to e.g. keep PC OEMs from featuring systems with AMD CPUs. Meanwhile by then Intel had figured out that the Pentium 4 was, in fact, a bad design, when their own Pentium M laptops started outperforming the Pentium 4 desktops. So the Pentium 4 line got canceled and Bulldozer had to go up against the Pentium M-based Core, which nearly bankrupted AMD and compromised their ability to fund the R&D needed to sustain state of the art fabs.
Since then they've been climbing back out of the hole but it wasn't until Ryzen in 2017 that you could safely conclude they weren't on the verge of bankruptcy, and even then they were saddled with a lot of debt and contracts requiring them to use the uncompetitive Global Foundries fabs for several years. It wasn't until Zen4 in 2022 that they finally got to switch the whole package to TSMC.
So until quite recently the answer to the question "why didn't they do X?" was obvious. They didn't have the money. But now they do.
Seven and a half years was the 2017 Ryzen release date. Zen 1 took them from being completely hopeless to having something competitive but only just, because they were still having the whole thing fabbed by GF. Their revenue didn't exceed what it was in 2011 until 2019 and didn't exceed Intel's until 2022. It's still less than Nvidia, even though AMD is fielding CPUs competitive with Intel and GPUs competitive with Nvidia at the same time.
They had a pretty good revenue jump in 2021 but much of that was used to pay down debt, because debt taken on when you're almost bankrupt tends to have unfavorable terms. So it wasn't until somewhere in 2022 that they finally got free of GF and the old debt and could start doing something about this. But then it takes some amount of time to actually do it, and you would expect to be seeing the results of that approximately right now. Which seems like a silly time to stop looking.
Also, somewhat counterintuitively, George Hotz et al seem to be employing a strategy in the nature of "say bad things about them in public to shame them into improving", which has the dual result of actually working (they fix a lot of the things he's complaining about) but also making people think that things are worse than they are because there is now a large public archive of rants about things they've already fixed. It's not clear if this is the company not providing a good mechanism for people to complain about things like that in private and have them fixed promptly so it doesn't have take media attention to make it happen, or it's George Hotz seeking publicity as is his custom, or some combination of both.
It has also been quite a while since Zen+ and Zen 2. Those poured in money, and they absolutely did not need to wait until they had more revenue than some chunk of Intel or until their debt was gone. If you think they got properly started on this in 2022, that's pretty damning.
I'm not basing anything on geohotz, just general discussions from people that have tried, and my own experience of trying to get some popular compute code bases to run. It has been so lacking compared to AMD's own support for games. I'm not going to be "silly" and "stop looking" going forward, but I'm not going to forget how long my card was largely abandoned. It went directly from "not ready yet, working on it" to "obsolete, maybe dregs will be added later".
> It has also been quite a while since Zen+ and Zen 2. Those poured in money
Zen+ and Zen 2 were released in 2019. Their revenue in 2019 was only 2.5% higher than it was in 2011; adjusted for inflation it was still down more than 10%.
> they absolutely did not need to wait until they had more revenue than some chunk of Intel or until their debt was gone.
The premise of the comparison is that it shows the resources they have available. To make the same level of investment as a bigger company you either have to take it out of profit (not possible when your net profit has a minus sign in front of it or is only a single digit) or you have to make more money first.
And carrying high interest debt when you're now at much lower risk of default is pretty foolish. You'd be paying interest that could be going to R&D. Even if you want to borrow money in order to invest it, the thing to do is to pay back the high interest debt and then borrow the money again now that you can get better terms, which seems to be just what they did.
I never said anything about wanting them to invest the same amount as nvidia or Intel. I think a handful of extra people could have made a big difference, in particular if some of them had the sole task of bringing their consumer cards into the support list.
It is so bad that they had major cards that were never on the support list for compute.
> You'd be paying interest that could be going to R&D.
Getting people to actually consider your datacenter cards, because they know how to use your cards, will get you more R&D money.
Have you tried compute shaders instead of that weird HPC-only stuff?
Compute shaders are widely used by millions of gamers every day. GPU vendors have huge incentive to make them reliable and efficient: modern game engines are using them for lots of thing, e.g. UE5 can even render triangle meshes with GPU compute instead of graphics (the tech is called nanite virtualized geometry). In practice they work fine on all GPUs, ML included: https://github.com/Const-me/Cgml
I'd be very concerned if somebody makes a $100K decision based on a comment where the author couldn't even differentiate between the words "constitutionally" and "institutionally", while providing as much substance as any other random techbro on any random forum and being overwhelmingly oblivious to it.
Having been working in DL inference for now 7+ years (5 of which at startup) which makes me comparably ancient in the AI world at this point. The performance rat race/treadmill is never ending, and to your point a large (i.e 2x+) performance improvement is not enough of a "painkiller" for customers unless there is something that is impossible for them to achieve without your technology.
The second problem is distribution: it is already hard enough to obtain good enough distribution with software, let alone software + hardware combinations. Even large silicon companies have struggled to get their HW into products across the world. Part of this is due to the actual purchase dynamics and cycle of people who buy chips, many design products and commit to N year production cycles of products built on certain hardware SKUs, meaning you have to both land large deals, and have opportune timing to catch them when they are evening shopping for a new platform. Furthermore the people with existing distribution i.e the Apple, Google, Nvidia, Intel, AMD, Qualcomms of the world already have distribution and their own offerings in this space and will not partner/buy from you.
My framing (which has remained unchanged since 2018) is that for silicon platform to win you have to beat the incumbents (i.e Nvidia) on the 3Ps: Price (really TCO), Performance, and Programmability.
Most hardware accelerators may win on one, but even then it is often theoretical performance because it assumes their existing software can/will work on your chip, which it often doesn't (see AMD and friends).
There are many other threats that come in this form, for example if you have a fixed function accelerator and some part of the model code has to run on CPU the memory traffic/synchronization can completely negate any performance improvements you might offer.
Even many of the existing silicon startups have been struggling with this since them middle of the last decade, the only thing that saved them is the consolidation to Transformers but it is very easy for a new model architecture to come out and require everyone to rework what they have built. This need for flexibility is what has given rise to the design ethos around GPGPU as flexibility in a changing world is a requirement not just a nice to have.
Best of luck, but these things are worth thinking deeply about as when we started in this market we were already aware of many of these things but their importance and gravity in the AI market have only become more important, not less :)
We've spent a lot of time thinking about these things, in particular, the 3Ps.
Part of making the one line of code work is addressing programmability. If you're on Jetson, we should load the CUDA kernels for Jetson's. If you're using a CPU, we should load the CPU kernels. CPU with AVX512, load the appropriate kernels with AVX512 instruction, etc.
The end goal is that when we introduce our custom silicon, one line of code should make it far easier to bring customers over from Jetson/any other platform because we handle loading the correct backend for them.
We know this will be bordering impossible, but it's critical to ensure we take on that burden rather than shifting it to the ML engineer.
Why start a company to make this product? Why not go work at one of the existing chip manufacturers? You'd learn a ton, get to design and work on HW and/or SW, and not have to do the million other things required to start a company.
We were waiting for a Bitnet-based software and hardware stack, particularly from Microsoft, but it never did. We were essentially nerd-sniped into working on this problem, then we realized it was also monetizable.
On a side note, I deeply looked into every company in the space and was thoroughly unimpressed with how little they cared about the software stack to make their hardware seamlessly work. So, even if I did go to work at some other hardware company, I doubt a lot of customers would utilize the hardware.
I recommend getting a job at NVIDIA. They care deeply about SW. It is a great place to learn about HW and the supporting SW. There is much to learn. Maybe you will learn why you are unimpressed with their SW offerings. For me, the hard part was the long lead time (8+ years) from design to customers using the product. One of the things that always amazed me about NVIDIA was that so many of the senior architects, who have no financial need to keep working (true for more than a decade), are still working there because they need the company to do what they love.
I think there is a comment somewhere here where I comment on NVIDIA, but I think NVIDIA is the best hardware company for making good software. We had a very niche software issue for which NVIDIA maintained open-source repos. I don't think NVIDIA's main advantage is its hardware, though; I think it's the software and the flexibility it brings to its hardware.
Suppose that Transformers die tomorrow, and Mamba becomes all the rage. The released Mamba code already has CUDA kernels for inference and training. Any of the CSPs or other NVIDIA GPU users can switch their entire software stack to train and inference Mamba models. Meanwhile, we'll be completely dead in the water with similar companies that made the same bet, like Etched.
You said (implied?) that your reason for starting a company was that you were waiting for somebody (MS) to build your favorite tech, and you realized it was monetizable. Finding a gap is a great start. But, if money is your goal, it is far easier to make money working at a company than starting one. Existing companies are great places to learn about technology, business, and the issues that should really drive your desire to start something yourself.
I don't think I ever implied we started this for money. We started working on the technology because it was exciting and enabled us to run LLMs locally. We wouldn't have started this company if someone else came along and did it, but we waited a month or two and didn't see anyone making progress. It just so happens that hardware is capital intensive, so making hardware means you need access to a lot of capital through grants (which Dartmouth didn't have for chip hardware) or venture capital (which we're going for now). I'm not sure where you got the idea we're doing this solely for money when I explicitly said "We were essentially nerd-sniped into working on this problem"
Glad to hear money isn't your focus. Your comment "...then we realized it was also monetizable" was the reason for my interpretation. Its also a very common rational. I don't know what "nerd-sniped" means, so...
Good luck with the VCs. I hope you all stay friends through the challenging process.
When performing performance optimization on CPUs, I was impressed with Intel's suite of tools (like VTUNE). NVIDIA has some unbelievable tools, like Nsys and, of course, its container registry (NGC), which I think surpasses even Intel's software support.
I think this is roughly correct. My 2c is that folks used the initial web data to cold start and bootstrap the first few models, but so much of the performance increase we have seen at smaller sizes is a shift towards more conscientious data creation/purchase/curation/preparation and more refined evaluation datasets. I think the idea of scraping random text except maybe for the initial language understanding pre-training phase will be diminished over time.
This is understood in the academic literature as well, as people months/years ago were writing papers that a smaller amount of high quality data, is worth more than a large amount of low quality data (which tracks with what you can pick up from an ML 101 education/training).
To chime in Apple already has a lot of great ML talent they are just far more deliberate and slow to change their products. People forget that FaceID was/is one of the most cutting edge ML features ever developed/deployed when it was released a few years ago.
Siri is sort of a red herring because its built by teams and tech that existed before Apple acquired most of its ML talent and some of its inability to evolve has been due to internal politics not the inability to build tech. iOS 17 is an example of Apple moving towards more deep learning speech/text work. I would bet heavily we will see them catch up with well integrated pieces as they have Money, infra, and already the ability to go wide (i.e all iOS users, again think FaceID).
Things are already possible on today's hardware, see https://github.com/mlc-ai/mlc-llm which allows many models to be run on M1/M2 Macs, WASM, iOS and more. The main limiting factor will be small enough, high quality enough models that performance is high enough ultimately this is HW limited and they will need to improve the neural engine/map more computation on to it to make the mobile exp. possible.
This is also just straight up FUD. ADHD is one of the few psychiatric conditions that has numerous effective medications which work reliably for a large part of the effected population.
Stimulants work for a large number of people diagnosed with ADHD with very little negative effects and are safe modulo a few exceptions for long term use.
Some individuals have negative experiences with Stimulant medications but I know from personal experience and from many friends in the ADHD community stimulants have literally been life saving for them.
They don't just reach for them because they are out to get you but they are effective for many people.
Furthermore many people who choose to forgo medication develop lifestyle and substance use issues which negative effects far out weigh low dose stimulants.
As other commenters said they are just a tool you still have to work on interventions, behavior modification, and so on.
At the end of the day in many ways ADHD is a disability (even if sometimes a super power) and you can't just delete it with a prescription.
Even if you forgo meds there are so many ways to boost your attention and quality of life and lots of research on what is effective, treatment can be much more than just medication.
It has nothing to do with the blog post quality or being different the guy has multiple key sentences which map to key ADHD experiences/symptoms. For those of us living with ADHD its just an empathy response as many of us have suffered from experiences which closely map to what he described in that paragraph. We are often just looking to share as many of us have improved our lives substantially after someone suggesting we should get ourselves checked out.
On this part in particular while it can be great to deeply follow your passion with extreme focus; pursuing things regardless of their importance in your overall life and at the cost of other interests, relationships, or responsibilities can be an empty and unfulfilling existence in the end. Furthermore life can be markedly better with the correct interventions and treatment.
People seemingly get offended by even a suggestion because many people have extreme stigma against conditions like ADHD as well as a lot of misinformation from people have very little understanding of the actual traits, diagnostic criteria, treatment and prevalence of it.