Tools like powermetrics or mactop consistently underreport GPU power usage on Apple M-series silicon. Worse, many reputable websites and Youtube channels use these tools to report and compare Apple chip power usage with the competition.
For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained.
Using undocumented low level Apple's APIs (SMC and IOReport), we were able to reverse engineer an energy model that explains almost all of of the energy flow in an Apple's SoC with less than 2% error on the workload I studied.
The result is a simple two-term energy roofline model:
P_GPU ≈ a * bytes + b * FLOPs
with:
~5 pJ/byte for SRAM movement
~2.7 pJ/FLOP for compute.
Not only that, but we were able to attribute energy flow to each of the principal functional blocks on the M4 Max SoC, like CPU, GPU compute, GPU SRAM, chip fabric components and DRAM.
For this one example:
179W System DC Power measured via SMC. Of which:
133W GPU (my inference)
18W DRAM
28W SoC Fabric (sum of 3 fabric related components)
<1W CPU
Think of these values as how much system DC power rise was due to GPU activity, DRAM activity, etc. They are not the exact electrical power, as the VRM losses are not included so the functional blocks slightly overestimate the actual electrical power flowing in.
Now, if you would want to compare against a discrete GPU whose DC power is measured at the board interface, one would definitely want to include DRAM and possible the Fabric power too (if the CPU power is minimal as in this example).
The video walks through the experiments and validation in detail. Happy to answer questions about the measurement setup or the kernels used.
Right now, the AMX units are significantly more efficient in GFLOPS/Watt than the GPU, but I wonder if that changed with the M5's new Neural Accelerator aka tensor cores.
Many reputable websites and channels “measure” the power consumption of Mac computers using apps like Max Power Gadget or asitop. These apps are built on top of the powermetrics MacOS facility.
However, in the man pages of the facility it is clearly explained that the readings are estimated and not measured values and that they shouldn’t be used for making power consumption comparisons.
In my performance investigation studies I found that the reported CPU and GPU power consumption is vastly underestimated: in case of the GPU the reported value is just over a third of the actual consumption, as is explained in the video.
It is unfortunate that said reviewers do not bother to sanity check the numbers with a power meter and thereby perpetuate misleading information on Apple silicon performance/Watt.
In a prior investigation, where I did extensive Linear Algebra performance investigations on the M4 Max, I found that it’s CPU performance/Watt in dense matrix multiplication is well below that of Ryzen 9950X, when the computations are carried out on its Neon vector units. Similarly for Apple’s M4 Max GPU vs nVidia last gen RTX 4090.
Compared to the M1, they kept the number of memory channels the same but increased the width of each channel. What are the performance implications of this from a pure CPU workload standpoint?
For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained.
Using undocumented low level Apple's APIs (SMC and IOReport), we were able to reverse engineer an energy model that explains almost all of of the energy flow in an Apple's SoC with less than 2% error on the workload I studied.
The result is a simple two-term energy roofline model:
P_GPU ≈ a * bytes + b * FLOPs
with:
~5 pJ/byte for SRAM movement
~2.7 pJ/FLOP for compute.
Not only that, but we were able to attribute energy flow to each of the principal functional blocks on the M4 Max SoC, like CPU, GPU compute, GPU SRAM, chip fabric components and DRAM.
For this one example:
179W System DC Power measured via SMC. Of which:
133W GPU (my inference) 18W DRAM 28W SoC Fabric (sum of 3 fabric related components) <1W CPU Think of these values as how much system DC power rise was due to GPU activity, DRAM activity, etc. They are not the exact electrical power, as the VRM losses are not included so the functional blocks slightly overestimate the actual electrical power flowing in.
Now, if you would want to compare against a discrete GPU whose DC power is measured at the board interface, one would definitely want to include DRAM and possible the Fabric power too (if the CPU power is minimal as in this example).
The video walks through the experiments and validation in detail. Happy to answer questions about the measurement setup or the kernels used.