I’m not sure it’s useful for negotiating, the capex to build it was surely orders of magnitude more than it would cost to just use one of the other frontier models.
It’s like someone negotiating by saying, “I’ll waste even MORE money to build something worse if you don’t give me a deal.”
I’m not discounting there may be other advantages to doing it. I just don’t think negotiating is one.
Ha! I normally wouldn’t find it quite so hilarious, but it’s a stylistically pixelated image. There’s just too much irony packed in there to not chuckle.
It's more halftone (might not be the correct term), not pixelated
There might be more irony in saying it's stylized pixels without realizing that the style of the image can't be replicated with blocks of the same size but I dunno, I'm not Alanis Morissette
I have been on the fence if I think composer is useful, but the speed argument is one I hadn’t really considered. I use cursor with Opus almost exclusively but the other day I tried using OpenCode locally with a 6-bit quantized version of Qwen 3.5 and holy crap the speed and latency were mind blowing. Even if not quite as sharp as big boi Opus and the gang.
Now you’ve got me thinking I should give composer another go because speed can be pretty darn great for more generic, basic, tasks.
I totally agree with the sentiment but from what I can tell, I’d say they tend happen immediately before or after markets open and close. Essentially, and to their maximum, screwing absolutely everyone who isn’t in the clique from participating in the trade.
FWIW— the only sure fire way to win the trade is to buy time and assume both gross incompetence and negligence when it comes action. The only caveat is if the markets tank enough, this administration will signal capitulation before hand, e.g. Trump mildly capitulating on tariffs last April after the markets proceed to relentlessly defecate themselves.
0-DTE options are typically, and for good reason, stupid gambles. But, right now they can’t even be considered gambling, because there’s zero chance of winning. Not just bad odds, but no odds. Again just signaling how truly malicious this admin is and its disdain for anyone and everyone not close to them.
> Most people don't want to live in dense urban cores, so #5 and #6 can easily backfire and stunt progress on #1.
80% of the US population would disagree. It really seems like you’re applying what you like to the entire population and then assuming that anything else is rubbish.
Having grown up in a rural community, and small towns, I never really want to go back. Dense urban areas are wonderful, I find huge amounts of joy in multiculturalism. The plethora of ideas, language, food, and art is inspiring. I will never get that anywhere except dense urban areas.
Demand vs supply is the crux of the affordability crisis, and the points outlined in the post you’re replying to are all valid and great ways to help increase supply.
And FWIW—- you’re absolutely welcome to enjoy and appreciate sparsely populated areas, but I really think you need to understand the vast majority of people disagree with you. Not because they’re “stuck” in some dense urban area but because they want to be there.
I don't know where you're coming up with that 80% number because the actual percentage of people living in dense urban cores is much lower. Many people live in neighborhoods that the Census classifies as "urban" but that includes a lot of neighborhoods that most regular people would classify as suburban. It turns out that given a choice, most people prefer to have some space and privacy rather that being squeezed together in high-rise apartments.
> 80% of the US population would disagree. It really seems like you’re applying what you like to the entire population and then assuming that anything else is rubbish.
I live by choice in what would be considered an urban area by the US Census, but is far from a dense urban core (by the character of the neighborhood, it's only a few miles away by distance). Either you don't understand what the Census data is saying or you're misrepresenting what myself and others are saying here.
> Having grown up in a rural community, and small towns, I never really want to go back. Dense urban areas are wonderful, I find huge amounts of joy in multiculturalism. The plethora of ideas, language, food, and art is inspiring. I will never get that anywhere except dense urban areas.
Good for you. My point, which seems to be lost on most urbanists, is that not everyone feels that way, or wants to live in that environment (consider me part of the second group, as I enjoy having access to quality food, art, entertainment, etc. but also enjoy having a yard for my kids to play in and enough distance between myself and my neighbors to have privacy and peace at home).
If someone has no interest in being inspired by multicultural food and would rather eat at a familiar restaurant in a small town, I feel no need to compel them to experience it.
> Demand vs supply is the crux of the affordability crisis, and the points outlined in the post you’re replying to are all valid and great ways to help increase supply.
Some are more valid than others. Building is good, compelling communities to increase density against their will is not.
> And FWIW—- you’re absolutely welcome to enjoy and appreciate sparsely populated areas, but I really think you need to understand the vast majority of people disagree with you. Not because they’re “stuck” in some dense urban area but because they want to be there.
There's a large gulf of housing stock and communities between "sparsely populated areas" and "dense urban areas" commonly called "the suburbs", where most people in the US live.
And I don't think the people who live in dense urban areas are stuck there. I just don't think the echo chamber of city planners, YIMBY advocates, and leftist politicians, all of whom believe that more density across every metropolitan area is the "correct" path forward, should have the final say on what communities are allowed to build or not build.
While I agree with the sentiment, and even had the same fears, I think about it differently now…
The existing megacorps have huge swaths of infrastructure, expenses, and requirements that require massive amounts of capex to maintain. Even if performative, Meta, Google, OpenAI, Anthropic, et. Al cannot simply layoff their entire engineering, accounting, HR, sales, and support infrastructure. Those orgs are large for “good” (historically necessary) reasons.
Now fast forward to today, and this is where I differ in opinion, it is our megacorps are the civilizations who should be scared of being discovered. Minus infrastructure providers, they are the large advanced entities which can be annihilated by someone with a decent budget and a good local model.
For ~$30k-$50k (primarily buying RTX 6000 pro GPUs and a CPU with enough PCIE lanes) “anyone” can build a system using open weight models that, and let me truly emphasize this: autonomously create functionality to compete. Previously it would take me months, or years, of immense dedication to show up after work and produce something of value. Now I can do it using excess compute on my existing workstation. No existing corporation can afford to undercut every possible idea. If I only gave 1000, 10,000, or a 100,000 users they cannot compete. That may, and I believe it will, provide more than enough capital to attack that megacorps X or Y. If I’m making $100k a month, I can afford multiple autonomous systems per month. After that initial capex, I can then hire other people to help manage them. At no point will a company with billions upon billions of dollars in quarterly capex be able to compete.
Maybe they can compete with one, two, ten, or a hundred but they cannot compete with the absolute onslaught on thousands of possible frontlines. They can cut costs, by reducing their workforce, but they’ll only be increasing their competition to save their earnings report.
And yes, I realize that the open weight models are created via obscene amounts of capital, but we’re lucky that competing nation states, and cultures, like China have immense incentive to do so. Good enough, is still good enough.
The forest may be dark, but it won’t be for much longer.
tldr; call the an ambulance, but not for me. It’s going to be for the existing power structure.
I don't think Apple just stumbled into it, and while I totally agree that Apple is killing it with their unified memory, I think we're going to see a pivot from NVidia and AMD. The biggest reason, I think, is: OpenAI has committed to enormous amount capex it simply cannot afford. It does not have the lead it once did, and most end-users simply do not care. There are no network effects. Anthropic at this point has completely consumed, as far as I can tell, the developer market. The one market that is actually passionate about AI. That's largely due to huge advantage of the developer space being, end users cannot tell if an "AI" coded it or a human did. That's not true for almost every other application of AI at this point.
If the OpenAI domino falls, and I'd be happy to admit if I'm wrong, we're going to see a near catastrophic drop in prices for RAM and demand by the hyperscalers to well... scale. That massive drop will be completely and utterly OpenAI's fault for attempting to bite off more than it can chew. In order to shore up demand, we'll see NVidia and AMD start selling directly to consumers. We, developers, are consumers and drive demand at the enterprises we work for based on what keeps us both engaged and productive... the end result being: the ol' profit flywheel spinning.
Both NVidia and AMD are capable of building GPUs that absolutely wreck Apple's best. A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant; and while, it helps their profitability it also forces them into less performant solutions. If NVidia dropped a 128GB GPU with GDDR7 at $4k-- absolutely no one would be looking for a Mac for inference. My 5090 is unbelievably fast at inference even if it can't load gigantic models, and quite frankly the 6-bit quantized versions of Qwen 3.5 are fantastic, but if it could load larger open weight models I wouldn't even bother checking Apple's pricing page.
tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.
> A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant
None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.
The "weird" thing Apple is doing is using normal DDR5 with a wider-than-normal memory bus to feed their GPUs instead of using GDDR or HBM. The disadvantage of this is that it has less memory bandwidth than GDDR for the same width of the memory bus. The advantage is that normal RAM costs less than GDDR. Combined with the discrete GPU market using "amount of VRAM" as the big feature for market segmentation, a Mac with >32GB of "VRAM" ended up being interesting even if it only had half as much memory bandwidth, because it still had more than a typical PC iGPU.
The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.
> None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.
the bottleneck in lots of database workloads is memory bandwidth. for example, hash join performance with a build side table that doesn't fit in L2 cache. if you analyze this workload with perf, assuming you have a well written hash join implementation, you will see something like 0.1 instructions per cycle, and the memory bandwidth will be completely maxed out.
similarly, while there have been some attempts at GPU accelerated databases, they have mostly failed exactly because the cost of moving data from the CPU to the GPU is too high to be worth it.
i wish aws and the other cloud providers would offer arm servers with apple m-series levels of memory bandwidth per core, it would be a game changer for analytical databases. i also wish they would offer local NVMe drives with reasonable bandwidth - the current offerings are terrible (https://databasearchitects.blogspot.com/2024/02/ssds-have-be...)
> the bottleneck in lots of database workloads is memory bandwidth.
It can be depending on the operation and the system, but database workloads also tend to run on servers that have significantly more memory bandwidth:
> i wish aws and the other cloud providers would offer arm servers with apple m-series levels of memory bandwidth per core, it would be a game changer for analytical databases.
There are x64 systems with that. Socket SP5 (Epyc) has ~600GB/s per socket and allows two-socket systems, Intel has systems with up to 8 sockets. Apple Silicon maxes out at ~800GB/s (M3 Ultra) with 28-32 cores (20-24 P-cores) and one "socket". If you drop a pair of 8-core CPUs in a dual socket x64 system you would have ~1200GB/s and 16 cores (if you're trying to maximize memory bandwidth per core).
The "problem" is that system would take up the same amount of rack space as the same system configured with 128-core CPUs or similar, so most of the cloud providers will use the higher core count systems for virtual servers, and then they have the same memory bandwidth per socket and correspondingly less per core. You could probably find one that offers the thing you want if you look around (maybe Hetzner dedicated servers?) but you can expect it to be more expensive per core for the same reason.
>The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.
Apple needs to solder it because they are attaching it directly to the SOC to
minimize lead length and that is part of how they are able to get that bandwidth.
The premise of the connector is that it attaches to the board in a similar way as soldering the chips (the LPCAMM connection interface is directly on the back of the chips) but uses compression instead of solder to make the electrical connection, so the traces are basically the same length but the modules can be replaced without soldering. There is no reason you couldn't use two modules to get a 256-bit memory bus. It sounds like AMD designed Strix Halo to assume soldered memory and then when Framework asked if it could use CAMM2 with no modifications to the chip, the answer was yes but not at the full speed of the CAMM2 spec.
CAMM2 supports LPDDR5X-9600, which is the same speed Apple uses in the newest machines:
The Max chips have a 512-bit memory bus. That's the one where the comment you linked suggests putting one module on each side of the chip as being fine, and there is no M4 Ultra or M5 Ultra so they could be using LPCAMM2 for their entire current lineup. The M3 Ultra had a 1024-bit memory bus, which is a little nuts, but it's also a desktop-only chip and then you don't have to be fighting with the trace lengths for LPDDR5 because you could just use ordinary DDR5 RDIMMs.
> but uses compression instead of solder to make the electrical connection
This is still going to have higher parasitic resistance and capacitance than a soldered connection. That's why it's not just a drop-in replacement for soldered RAM. You'd either have to use more power or run the RAM slower.
> one module on each side of the chip as being fine.
It's fine if you've got space to spare. It's not very practical for a laptop form factor.
>but it's also a desktop-only chip and then you don't have to be fighting with the trace lengths for LPDDR5 because you could just use ordinary DDR5 DIMMs.
Given how few desktops Apple sells compared to laptops, I seriously doubt that they'd want to use a completely different memory configuration just for their desktop systems.
> This is still going to have higher parasitic resistance and capacitance than a soldered connection. That's why it's not just a drop-in replacement for soldered RAM. You'd either have to use more power or run the RAM slower.
This isn't accurate. A compression interface can have the same resistance as a soldered connection.
There is a small infelicity with DDR5 because the DDR5 spec was finalized before the CAMM2 spec and the routing on the chips isn't optimal for it, so for DDR5 CAMM2 requires slightly tighter tolerances to hit 9600 MT/s, which is presumably the trouble they ran into with Strix Halo, but even then it can do it if you design for it from the beginning, and they've fixed it for DDR6.
> It's fine if you've got space to spare. It's not very practical for a laptop form factor.
The modules take up approximately the same amount of space on the board as the chips themselves. It's just a different way of attaching them to it.
> Given how few desktops Apple sells compared to laptops, I seriously doubt that they'd want to use a completely different memory configuration just for their desktop systems.
DDR5 and LPDDR5 are nearly identical, the primary difference is that LPDDR5 has tighter tolerances to allow it to run at the same speeds at a lower voltage/power consumption. When you already have the design that meets the tighter tolerances, relaxing them in the system where you're not worried about 2 watts of battery consumption is making your life easier instead of harder.
>This isn't accurate. A compression interface can have the same resistance as a soldered connection.
All the information I can find suggests that CAMM2 will have higher parasitic resistance and capacitance than a soldered connection. Do you have a source for this claim?
The issue isn't just reaching a certain speed, but doing it at the same power consumption.
>The modules take up approximately the same amount of space on the board as the chips themselves.
They do take up more space, as anyone can easily check. Modern laptop motherboards can be very small, so this is significant.
>DDR5 and LPDDR5 are nearly identical [...]
What I mean is that Apple isn't going to want to invest any resources in adding the option of external RAM just for the relatively tiny desktop market. It's not that it's technically difficult; it just doesn't make sense from a logistical point of view.
> Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.
Isn't that also because that's world we have optimized workloads for?
If the common hardware had unified memory, software would have exploited that I imagine. Hardware and software is in a co-evolutionary loop.
Part of the problem is that there is actually a reason for the distinction, because GPUs need faster memory but faster memory is more expensive, so then it makes sense to have e.g. 8GB of GDDR for the GPU and 32GB of DDR for the CPU, because that costs way less than 40GB of GDDR. So there is an incentive for many systems to exist that do it that way, and therefore a disincentive to write anything that assumes copying between them is free because it would run like trash on too large a proportion of systems even if some large plurality of them had unified memory.
A sensible way of doing this is to use a cache hierarchy. You put e.g. 8GB of expensive GDDR/HBM on the APU package (which can still be upgraded by replacing the APU) and then 32GB of less expensive DDR in slots on the system board. Then you have "unified memory" without needing to buy 40GB of GDDR. The first 8GB is faster and the CPU and GPU both have access to both. It's kind of surprising that this configuration isn't more common. Probably the main thing you'd need is for the APU to have a direct power connector like a GPU so you're not trying to deliver most of a kilowatt through the socket in high end configurations, but that doesn't explain why e.g. there is no 65W CPU + 100W GPU with a bit of GDDR to be put in the existing 170W AM5 socket.
However, even if that was everywhere, it's still doesn't necessarily imply there are a lot of things that could do much with it. You would need something that simultaneously requires more single-thread performance than you can get from a GPU, more parallel computation than you can get from a high-end CPU, and requires a large amount of data to be repeatedly shared between those subsets of the computation. Such things probably exist but it's not obvious that they're very common.
Except they don't use DDR5. LPDDR5 is always soldered. LPDDR5 requires short point-to-point connections to give you good SI at high speeds and low voltages. To get the same with DDR5 DIMMs, you'd have something physically much bigger, with way worse SI, with higher power, and with higher latency. That would be a much worse solution. GDDR is much higher power, the solution would end up bigger. Plus it's useless for system memory so now you need two memory types. LPDDR5 is the only sensible choice.
CAMM2 is new and most of the PC companies aren't using it yet but it's exactly the sort of thing Apple used to be an early adopter of when they wanted to be.
And it's called "CAMM2" because it's not even the first version. Apple could have been working with the other OEMs on this since 2022 and been among the first to adopt it instead of the last:
> tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.
These companies always try to preserve price segmentation, so I don’t have high hopes they’d actually do that. Consumer machines still get artificially held back on basic things like ECC memory, after all . . .
Can we also stop giving Apple some prize for unified memory?
It was the way of doing graphics programming on home computers, consoles and arcades, before dedicated 3D cards became a thing on PC and UNIX workstations.
Can we please stop treating this like some 2000s Mac vs PC flame war where you feel the need go full whataboutism whenever anyone acknowledges any positive attribute of any Apple product? If you actually read back over the comments you’re replying to, you’ll see that you’re not actually correcting anything that anyone actually said. This shit is so tiring.
It depends on what you mean, do you mean both gross and net? Just one of the two?
Gross margin of zero would be mean you sell at exactly the cost to produce. Net margin of zero means you cover all your expenses including COGS. The only really difficult, practically impossible, thing would be doing both at the same time. Though, I could also see a case where you drive down net margins once sunk costs are paid and achieve both.
Doing so practically, or sustainably, in most circumstances would be uhh crazy… but it’s not impossible. Even then I think aiming for zero margin is a pretty credible tactic in eliminating competition if you can out sustain them.
TLDR; Weird? Sure. But not impossible. And even sort of likely if you’re trying to atrophy your competition out of existence.
I am not surprised by this, and am glad to see a test like this. One thing that keeps popping up for me when using LLMs is the lack of actual understanding. I write Elixir primarily and I can say without a doubt, that none of the frontier models understand concurrency in OTP/Beam. They look like they do, but they’ll often resort to weird code that doesn’t understand how “actors” work. It’s an imitation of understanding that is averaging all the concurrency code it has seen in training. With the end result being huge amount of noise, when those averages aren’t enough, guarding against things that won’t happen, because they can’t… or they actively introduce race conditions because they don’t understand how message passing works.
Current frontier models are really good at generating boiler plate, and really good at summarizing, but really lack the ability to actually comprehend and reason about what’s going on. I think this sort of test really highlights that. And is a nice reminder that, the LLMs, are only as good as their training data.
When an LLM or some other kind of model does start to score well on tests like this, I’d expect to see better them discovering new results, solutions, and approaches to questions/problems. Compared to how they work now, where they generally only seem to uncover answers that have been obfuscated but are present.
I’ve been thinking about this a lot lately… For years this has been said, and for most of us isn’t something we’ve been able to experience until recently. Yet, now we can see how chatbots have made sane folks lose their minds, by simply being too agreeable. I think it’s a grim look at what it’s like to be hyper wealthy. The odds that they’ve completely disassociated from reality, IMHO, have increased exponentially after seeing the effects on “normal” people. The only difference is us plebs, don’t have the resources to then bring our distorted view of reality to life.
It’s like someone negotiating by saying, “I’ll waste even MORE money to build something worse if you don’t give me a deal.”
I’m not discounting there may be other advantages to doing it. I just don’t think negotiating is one.
reply