ricebunny's comments

ricebunny · 2026-03-15T16:17:21 1773591441

Tools like powermetrics or mactop consistently underreport GPU power usage on Apple M-series silicon. Worse, many reputable websites and Youtube channels use these tools to report and compare Apple chip power usage with the competition.

For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained.

Using undocumented low level Apple's APIs (SMC and IOReport), we were able to reverse engineer an energy model that explains almost all of of the energy flow in an Apple's SoC with less than 2% error on the workload I studied.

The result is a simple two-term energy roofline model:

P_GPU ≈ a * bytes + b * FLOPs

with:

~5 pJ/byte for SRAM movement

~2.7 pJ/FLOP for compute.

Not only that, but we were able to attribute energy flow to each of the principal functional blocks on the M4 Max SoC, like CPU, GPU compute, GPU SRAM, chip fabric components and DRAM.

For this one example:

179W System DC Power measured via SMC. Of which:

133W GPU (my inference) 18W DRAM 28W SoC Fabric (sum of 3 fabric related components) <1W CPU Think of these values as how much system DC power rise was due to GPU activity, DRAM activity, etc. They are not the exact electrical power, as the VRM losses are not included so the functional blocks slightly overestimate the actual electrical power flowing in.

Now, if you would want to compare against a discrete GPU whose DC power is measured at the board interface, one would definitely want to include DRAM and possible the Fabric power too (if the CPU power is minimal as in this example).

The video walks through the experiments and validation in detail. Happy to answer questions about the measurement setup or the kernels used.

ricebunny · 2025-12-13T14:46:21 1765637181

Right now, the AMX units are significantly more efficient in GFLOPS/Watt than the GPU, but I wonder if that changed with the M5's new Neural Accelerator aka tensor cores.

ricebunny · 2025-12-11T19:43:54 1765482234

Many reputable websites and channels “measure” the power consumption of Mac computers using apps like Max Power Gadget or asitop. These apps are built on top of the powermetrics MacOS facility.

However, in the man pages of the facility it is clearly explained that the readings are estimated and not measured values and that they shouldn’t be used for making power consumption comparisons.

In my performance investigation studies I found that the reported CPU and GPU power consumption is vastly underestimated: in case of the GPU the reported value is just over a third of the actual consumption, as is explained in the video.

It is unfortunate that said reviewers do not bother to sanity check the numbers with a power meter and thereby perpetuate misleading information on Apple silicon performance/Watt.

In a prior investigation, where I did extensive Linear Algebra performance investigations on the M4 Max, I found that it’s CPU performance/Watt in dense matrix multiplication is well below that of Ryzen 9950X, when the computations are carried out on its Neon vector units. Similarly for Apple’s M4 Max GPU vs nVidia last gen RTX 4090.

ricebunny · on March 7, 2025

Uhm, we can expect close to 8 FP32 TFLOPS from the CPUs alone on the M3 Ultra. It comes with 4 tensor engines (AMX) each capable of about 2 TFLOPs.

M3 Max GPU benchmarks around 14 TFLOPs, so the Ultra should score around 28 TFLOPs.

Double the numbers for FP16.

ricebunny · on May 14, 2024

That's probably because SPEC is also a "separate task" based benchmark.

ricebunny · on Oct 19, 2021

Compared to the M1, they kept the number of memory channels the same but increased the width of each channel. What are the performance implications of this from a pure CPU workload standpoint?

ricebunny · on Dec 3, 2020

Does the 8GB variant have all 8 channels or just 4?

sliken · on Dec 3, 2020

My guess is it's the same and using the half density chip in the same family, but I'm just guessing.

HN For You