After years of repeatedly struggling with vtune installs in cloud environments just to get a few specific hw metrics only to fall back on manually building some perf stat events from Perfmon, I finally just built the functionality I always wanted. It automatically generates the set of hardware counters required for your requested metrics, formats it, and then parses the output to evaluate the metric formulas on the raw events
Despite flamegraphs being very intuitive I still have people ask me, what do I look at? Instead of explaining my usual cmd+f's over an over I made a tool which can annotate flamegraphs.
Another downside of flamegraphs is you cannot easily aggregate across multiple stacks without multiple cmd+f's.