Will this strategy work every time ? Maybe for AI it will work (market is competitive and Apple just purchases the best model for its consumers).
But this approach may not work in other areas: e.g. building electric batteries, wireless modems, electric cars, solar cell technology, quantum computing etc.
Essentially Apple got lucky with AI but it needs to keep investing in cutting edge technology in the various broad areas it operates in and not let others get too far ahead !
Their focus is investing in areas where they see something being a competitive differentiator, or where the market has failed to create a competitive environment.
They do not make their own screens because they can source screens from multiple sources and work with those manufacturers to create screens with the properties they want. Same thing with them relying on others for electric batteries - there are plenty of manufacturers to provide batteries to Apple's spec.
They created their own wireless modems because there's only one company they were able to purchase modems from, and those modems did not necessarily have the features Apple wanted.
Apple hasn't announced any interest in selling electric cars, solar cell technology, or quantum computing platforms. I wouldn't expect them to do so until they had a consumer product ready for sale. I doubt they are planning to come out with products in any of these categories soon.
It works often enough for the company to be wildly successful. They can simply cut their losses and withdraw from industries where it hasn't, such as EVs.
I think their M chips are a good example. They ran on intel for so long, then did the impossible of changing architecture on Mac, even without much transition pain.
Obviously that was built upon years of iPhone experience, but it shows they can lag behind, buy from other vendors, and still win when it becomes worth it to them.
How is changing the architecture of a platform that only you make hardware for doing the impossible?
They could change the architecture again tonight, and start releasing new machines with it. The users will adopt because there is literally no other choice.
Every machine they release will be fastest and most capable on the platform, because there is no other option
Exactly this! Rosetta + the whole app developer community who really quickly released builds for M chips (voluntary or forced, but it did happen).
I had the initial m1 air, and it was remarkable how useable it was. You'd expect all sorts of friction and issue but mostly things just worked (very fast). Even with some Rosetta overhead it was still fast compared to intel macs.
Rosetta 1 delivered 50-80% of the performance of native, during the PPC->Intel transition. It turns out, you can deliver not particularly impressive performance and still not ruin your app ecosystem, because developers have to either update to target your new platform, or leave your platform entirely.
You can also voluntarily cut off huge chunks of your own app ecosystem intentionally, by giving up 32bit support and requiring everything to be 64bit capable.
...because users have no other choice when only one vendor controls the both the hardware+software. They can either use the apps still available to them, or they can leave. And the cost of leaving for users is a lot higher.
Yes. Apple put custom hardware support in the M series chips based on the needs of Rosetta 2. The x86_64 performance on Rosetta 2 was often higher at launch than the prior generation of Intel chips running those same binaries natively.
Microsoft and Qualcomm already knew the performance of x86 app emulation on windows was killing the ARM machine lineup, so Qualcomm was working on extensions to their chips and Microsoft on having Windows support them already, but ARM64EC and Prism didn't launch for two years after the M1 shipped.
It's also notably not the first time they switched. They did the Motorola (I think MIPS?) Archictecure, then IBM PowerPC, then Intel x86 (for a single generation, then x86_64) and now Apple M-Series.
They do the things they think they can do very well.
Why would they try to build electric batteries, wireless modems, electric cars, solar cells, or quantum computers, if their R&D hadn't already determined that they would likely be able to do so Very Well?
It's not like any of those are really in their primary lines of business anyway.
They (Apple) bought out intel's wireless modems and are using them instead of Qualcomm's chips. IIRC, they aren't the best in class when it comes to raw throughput, but quite good in terms of throughput vs power consumption.
This article seems relevant to me for the following scenario:
- You have faulty software (e.g. games) that happen to have split locks
AND
- You have DISABLED split lock detection and "mitigation" which would have hugely penalised the thread in question (so the lock becomes painfully evident to that program and forced to be fixed).
AND
- You want to see which CPU does best in this scenario
In other words you just assume the CPU will take the bus lock penalty and continue WITHOUT culprit thread being actively throttled by the OS.
In the normal case, IIUC Linux should helpfully throttle the thread so the rest of the system is not affected by the bus lock. In this benchmark here the assumption is the thread will NOT be throttled by Linux via appropriate setting.
So to be honest I don't see the merit of this study. This study is essentially how fast is your interconnect so it can survive bad software that is allowed to run untrammelled.
On aarch64 the thread would simply be killed. It's possible to do the same on modern AMD / Intel also OR simply throttle the thread so that it does not cause problems via bus locks that affect other threads -- none of these are done in this benchmark.
> So to be honest I don't see the merit of this study. This study is essentially how fast is your interconnect so it can survive bad software that allowed to run untrammelled.
It seems like a worthwhile study if you want to know what CPU to buy to play specific old games that use bus locks. Games that will never be fixed.
It seemed to me that the issue with the games was that they did split locks at all, and when Linux detected that and descheduled the process, performance was trash. I didn't think they were doing frequent split locks that resulted in bad performance by itself.
You don't need to be a careful shopper for this; just turn off detection while you're playing these games, or tune the punishment algorithm, or patch the game. Just because the developer won't doesn't mean you can't; there's plenty of 3rd party binary patches for games.
I'd like to know more about what it takes to turn on PCI pass through for laptop hardware. On desktops and servers it's typically the IOMMU setting in the BIOS. Is that also commonly available on laptops?
> I would get a new laptop because a laptop without WiFi is useless.
You can run Linux in a VM and PCI passthrough your WiFi Adapter. Linux drivers will be able to connect to your wifi card and you can then supply internet to FreeBSD.
Doing this manually is complicated but the whole process has been automated on FreeBSD by "Wifibox"
is there a similar thing for GPUs? I want to build a workstation and have it work on freebsd but would prefer to use an intel arc card which has no information about freebsd compatibility online
I am partial to your sentiment but I don't think writing all the terminal handling code in elisp gives us code that might be too interesting to read (to me at least).
Understanding the VT state machine and all its quirks and inconsistencies is not high up in my list of code I'd like to learn. It is good it is packaged up in a library and emacs is just a consumer of it.
libghostty will have excellent compatibility and features rather than an elisp implementation that maybe half baked.
I stopped living in the world of turtles all the way down. Now I'm more like, hey is this is good library ? Is it integrated well ? It does not matter if it is in zig, rust, c++, lisp, scheme, ...
Jitted Elisp for itself has much more power because of function composability than badly reusing libraries without even a common API like OLE/COM under Windows. You are just creating silos badly interopearting together.
Even 9front has something like 9p, namespaces and everything it's truly a file. Even GNU/Emacs under Hurd doesn't have its full power developed until the GNU people ditch Gnuplot for their own GNU-born capable 3D plotutils and the like.
And today given the speed of jitted Emacs if I were the Calc maintainer I'd try to write a PNG/farfbled (or whatever it's called) plotting tool in pure Elisp, with both TTY and graphical outputs.
Depending on non-GNU, external tools it's holding GNU and Elisp back.
> Status: Early prototype. Fully vibe coded. [...]
Cool project... However, the terminal is where you enter passwords, ssh, set API keys etc. Something so sensitive should not be "Fully vibe coded".
For a project like this, I would expect to see a clarification which might read something like this: "Fully vibe coded, but I audited each and every line of generated code and I am already a domain expert in vt sequences and emacs so I know this program should be OK." But given that I did NOT see a clarification or statement like this, it becomes very difficult to trust a project like this.
Looking at the sophistication of modern security exploits, I'd say that just a few minor gaps, strategically positioned, can lead to surprisingly drastic results. Of course, Emacs is a niche editor/IDE/OS/whatnot, so an unlikely target, but still.
It's a great proof of concept though. In the meantime, I'll stick with vterm.
I love FreeBSD but Linux just provides every feature under the sun when it comes to virtualization. Do you find any missing features on bhyve ? Is bhyve reliable ? I can't imagine its been tested as thoroughly as KVM ...
Bhyve is quite cool but no nested virt which means you cannot nest vm_enter/exit calls with EPT pages so you cannot virtualize within those guests. I found this crucial. For instance Qubes OS won't run in Bhyve by any means.
Anecdotally, Bhyve has worked in FreeBSD for a decade now. Eventually it got ported to Illumos because it was better than their implementation of QEMU.
If you are unsure of bhyve's abilities then why not test yourself? Speculation and guessing about stability or testing is useless without seeing if it works in your application.
> If you are unsure of bhyve's abilities then why not test yourself?
It is not possible to come to a conclusion about everything in the world yourself "from scratch". No one has the time to try out everything themselves. Some filteration process needs to be applied to prevent wasting your finite time.
That is why you ask for recommendations of hotels, restaurants, travel destinations, good computer brands, software and so on from friends, relatives or other trusted parties/groups. This does not mean your don't form your opinions. You use the opinions of others as a sort of bootstrap or prior which you can always refine.
HN is actually the perfect place to ask for opinions. Someone just said bhyve does not support nested virtualization (useful input !). Someone else might chime in and say they have run bhyve for a long time and they trust it (and so on...)
I agree with you and do not understand the “I read every manual” and “I test all software” crowd. I play around with A LOT of software but I cannot test it all.
Speculation is not useless if you are saying “the answer I got makes it 99% likely that this solution will not work for me”. Curation has immense value in the world today. I investigate only the options most likely to be useful. And that still takes all my time.
The phrasing of your questions is the problem. They are uninformed, too general, and assuming. The last sentence reads as if you outright dismiss bhyve because YOU can't imagine it was tested thoroughly.
> It is not possible to come to a conclusion about everything in the world yourself "from scratch". No one has the time to try out everything themselves. Some filteration process needs to be applied to prevent wasting your finite time.
It's totally possible when you know what your application requires but you didn't state anything.
> Someone just said bhyve does not support nested virtualization (useful input !).
Ok you have a problem with the way I framed my questions and my (unintentional) tonality. Fair enough. Let's move from critique of the way I asked my questions to what your experience with bhyve has been, if you're willing to share that.
Have you used bhyve ? What has your experience been with it ? Have you used KVM+QEMU -- can you compare your experience between both of them ?
One of the biggest knocks against Rust as a systems programming language is that it has weak compile-time and metaprogramming capabilities compared to Zig and C++
In the space of language design, everything "more powerful" is not necessary good. Sometimes less power is better because it leads to more optimisable code, less implementation complexity, less abstraction, better LSP support. TL;DR More flexibility and complexity is not always good.
Though I would also challenge the fact that Rust's metaprogramming model is "not powerful enough". I think it can be.
> And not only for performance but also for thread safety
This is already built-in to the language as a facet of the affine type system. I'm curious as to how familiar you actually are with Rust?
> Rust is just less powerful.
On the contrary. Zig and C++ have nothing even remotely close to proc macros. And both languages have to defer things like thread safety into haphazard metaprogramming instead of baking them into the language as a basic semantic guarantee. That's not a good thing.
Writing general generic code without repetition for Rust without specialization is ome thing where it fails. It does not have variadics or so powerful compile metaprogramming. It does not come even remotely close.
Proc macros is basically plugins. I do not think thos is even part of the "language" as such. It is just plugging new stuff into the compiler.
> For example you cannot design something that comes evwn close to expression templates libraries.
You keep saying this and it's still wrong. Rust is quite capable of expression templates, as its iterator adapters prove. What it isn't capable of (yet) is specialization, which is an orthogonal feature.
Rust cannot take a const function and evaluate that into the argument of a const generic or a proc macro. As far as I can tell, the reasons are deeply fundamental to the architecture of rustc. It's difficult to express HOW FUNDAMENTAL this is to strongly typed zero overhead abstractions, and we see where Rust is lacking here in cases like `Option` and bitset implementations.
> Rust is quite capable of expression templates, as its iterator adapters prove.
AFAIU iterator adapters are not quite what expression templates are because they rely on the compiler optimizations rather than the built-in feature of the language, which enable you to do this without relying on the compiler pipeline.
I had always thought expression templates at the very least needed the optimizer to inline/flatten the tree of function calls that are built up. For instance, for something like x + y * z I'd expect an expression template type like sum<vector, product<vector, vector>> where sum would effectively have:
That would require the optimizer to inline the latter into the former to end up with a single expression, though. Is there a different way to express this that doesn't rely on the optimizer for inlining?
Expression templates do not rely on optimizer since you're not dealing with the computations directly but rather expressions (nodes) through which you are deferring the computation part until the very last moment (when you have a fully built an expression of expressions, basically almost an AST). This guarantees that you get zero cost when you really need it. What you're describing is something keen of copy elision and function folding though inlining which is pretty much basics in any c++ compiler and happens automatically without special care.
> since you're not dealing with the computations directly but rather expressions (nodes) through which you are deferring the computation part until the very last moment (when you have a fully built an expression of expressions, basically almost an AST).
Right, I understand that. What is not exactly clear to me is how you get from the tree of deferred expressions to the "flat" optimized expression without involving the optimizer.
Take something like the above example for instance - w = x + y * z for vectors w/x/y/z. How do you get from that to effectively
for (size_t i = 0; i < w.size(); ++i) {
w[i] = x[i] + y[i] * z[i];
}
The example is false because that's not how you would write an expression template for given computation so the question being how is it that the optimizer is not involved is also not quite set in the correct context so I can't give you an answer for that. Of course that the optimizer is generally going to be involved, as it is for all the code and not the expression templates, but expression templates do not require the optimizer in the way you're trying to suggest. Expression templates do not rely on O1, O2 or O3 levels being set - they work the same way in O0 too and that may be the hint you were looking for.
> The example is false because that's not how you would write an expression template for given computation
OK, so how would you write an expression template for the given computation, then?
> Expression templates do not rely on O1, O2 or O3 levels being set - they work the same way in O0 too and that may be the hint you were looking for.
This claim confuses me given how expression templates seem to work in practice?
For example, consider Todd Veldhuizen's 1994 paper introducing expression templates [0]. If you take the examples linked at the top of the page and plug them into Godbolt (with slight modifications to isolate the actual work of interest) you can see that with -O0 you get calls to overloaded operators instead of the nice flattened/unrolled/optimized operations you get with -O1.
You see something similar with Eigen [2] - you get function calls to "raw" expression template internals with -O0, and you need to enable the optimizer to get unrolled/flattened/etc. operations.
Similar thing yet again with Blaze [3].
At least to me, it looks like expression templates produce quite different outputs when the optimizer is enabled vs. disabled, and the -O0 outputs very much don't resemble the manually-unrolled/flattened-like output one might expect (and arguably gets with optimizations enabled). Did all of these get expression templates wrong as well?
Look, I have just completed work on some high performance serialization library which avoids computing heavy expressions and temporary allocations all by using expression templates and no, optimization levels are not needed. The code works as advertised at O0 - that's the whole deal around it. If you have a genuine question you should ask one but please do not disguise so that it only goes to prove your point. I am not that naive. All I can say is that your understanding of expression templates is not complete and therefore you draw incorrect conclusions. Silly example you provided shows that you don't understand how expression template code looks like and yet you're trying to prove your point all over and over again. Also, most of the time I am writing my comments on my mobile so I understand that my responses sometime appear too blunt but in any case I will obviously not going to write, run or check the code as if I had been on my work. My comments here is not work, and I am not here to win arguments, but most of the time learn from other people's experiences, and sometimes dispute conclusions based on those experiences too. If you don't believe me, or you believe expression templates work differently, then so be it.
> If you have a genuine question you should ask one but please do not disguise so that it only goes to prove your point.
I think my question is pretty simple: "How does an optimizer-independent expression template implementation work?" Evidently the resources I've found so far describe "optimizer-dependent expression templates", and apparently none of the "expression template" implementations I've had reason to look at disabused me of that notion.
> My comments here is not work, and I am not here to win arguments, but most of the time learn from other people's experiences, and sometimes dispute conclusions based on those experiences too.
Sure, and I like to learn as well from the more knowledgeable/experienced folk here, but as much as I want to do so here I'm finding it difficult since there's precious little for me to go off of beyond basically just being told I'm wrong.
> If you don't believe me, or you believe expression templates work differently, then so be it.
I want to understand how you understand expression templates, but between the above and not being able to find useful examples of your description of expression templates I'm at a bit of a loss.
Expression templates do AST manipulation of expressions at compile time. Let's say you have a complex matrix expression that naively maps to multiple BLAS operations but can be reduced to a single BLAS call. With expression templates you can translate one to the other, this is a static manipulation that does not depend on compiler level. What does depend on the compiler is whether the incidental trivial function calls to operators gets optimized away or not. But, especially with large matrices, the BLAS call will dominate anyway, so the optimization level shouldn't matter.
Of course in many cases the optimization level does matter: if you are optimizing small vector operators to simd inlining will still be important.
> With expression templates you can translate one to the other, this is a static manipulation that does not depend on compiler level.
How does that work on an implementation level? First thing that comes to mind is specialization, but I wouldn't be surprised if it were something else.
> What does depend on the compiler is whether the incidental trivial function calls to operators gets optimized away or not.
> Of course in many cases the optimization level does matter: if you are optimizing small vector operators to simd inlining will still be important.
Perhaps this is the source of my confusion; my uses of expression templates so far have generally been "simpler" ones which rely on the optimizer to unravel things. I haven't been exposed much to the kind of matrix/BLAS-related scenarios you describe.
Partial specialization specifically. Match some patterns and covert it to something else. For example:
struct F { double x; };
enum Op { Add, Mul };
auto eval(F x) { return x.x; }
template<class L, class R, Op op> struct Expr;
template<class L, class R> struct Expr<L,R,Add>{ L l; R r;
friend auto eval(Expr self) { return eval(self.l) + eval(self.r); } };
template<class L, class R> struct Expr<L,R,Mul>{ L l; R r;
friend auto eval(Expr self) { return eval(self.l) * eval(self.r); } };
template<class L, class R, class R2> struct Expr<Expr<L, R, Mul>, R2, Add>{ Expr<L,R, Mul> l; R2 r;
friend auto eval(Expr self) { return fma(eval(self.l.l), eval(self.l.r), eval(self.r));}};
template<class L, class R>
auto operator +(L l, R r) { return Expr<L, R, Add>{l, r}; }
template<class L, class R>
auto operator *(L l, R r) { return Expr<L, R, Mul>{l, r}; }
double optimized(F x, F y, F z) { return eval(x * y + z); }
double non_optimized(F x, F y, F z) { return eval(x + y * z); }
Optimized always generates a call to fma, non-optimized does not. Use -O1 to see the difference (will inline trivial functions, but will not do other optimizations). -O0 also generates the fma, but it is lost in the noise.
The magic happens by specifically matching the pattern Expr<Expr<L, R, Mul>, R2, Add>; try to add a rule to optimize x+y*z as well.
Hrm, OK, that makes sense. Thanks for taking the time to explain! Guessing optimizing x+y*z would entail something similar to the third eval() definition but with Expr<L, Expr<L2, R2, Mul>, Add> instead.
I think at this point I can see how my initial assertion was wrong - specialization isn't fully orthogonal to expression templates, as the former is needed for some of the latter's use cases.
Does make me wonder how far one could get with rustc's internal specialization attributes...
> it could be possible that llms can mak great use of them
This is actually a good point. Yes, LLMs have saturated the conversation everywhere but contracts help clarify the pre-post conditions of methods well. I don't know how good the implementation in C++ will be but LLMs should be able to really exploit them well.
The problem with that is that C++26 Contracts are just glorified asserts. They trigger at runtime, not compile time. So if your LLM-generated code would have worked 99% of the time and then crashed in the field... well, now it will work 99% of the time and (if you're lucky) call the contract-violation handler in the field.
Arguably that's better (more predictable misbehavior) than the status quo. But it's not remotely going to fix the problem with LLM-generated code, which is that you can't trust it to behave correctly in the corner cases. Contracts can't make the code magically behave better; all they can do is make it misbehave better.
In my experience, llms don't reason well about expected states, contracts, invariants, etc.
Partly because that don't have long term memory and are often forced to reason about code in isolation.
Maybe this means all invariants should go into AGENTS.md/CLAUDE.md files, or into doc strings so a new human reader will quickly understand assumptions.
Regardless, I think a habit of putting contracts to make pre- and post-conditions clear could help an AI reason about code.
Maybe instead of suggesting a patch to cover up a symptom, an AI may reason that a post-condition somewhere was violated, and will dig towards the root cause.
This applies just as well to asserts, too.
Contracts/asserts actually need to be added to tell a reader something.
But this approach may not work in other areas: e.g. building electric batteries, wireless modems, electric cars, solar cell technology, quantum computing etc.
Essentially Apple got lucky with AI but it needs to keep investing in cutting edge technology in the various broad areas it operates in and not let others get too far ahead !
reply