Part of the delay is really just commercial. Fabs are optimized for utilization - throughput, not latency. A fab operator will prefer to queue up a load of work with as few gaps as possible, and your shuttle service run has to fit in one of the gaps. If you're NVIDIA and you've already booked the fab, there might not be so much delay. But not zero.
Just to buttress and embroider around your point that a fab is not a small business:
If there was a realistic way even to go from bare wafers to non-trivial custom chips in a small-batch fashion, you can bet there would be a cottage industry around it. I would love to live in a world where I could manufacture custom silicon as easily as I can manufacture a custom PCB or custom mechanical part.
But as it stands, quick-turn, rapid-proto "micro" fabs are obscenely expensive, to the extent that if you aren't absolutely certain you need the performance gains from custom silicon, justified by years of R&D that confirms the inadequacy of a multi-chip solution, then the idea is killed before any layout engineer is contacted.
Microfabs are either operated by research institutes, or they're booked solid for years, and basically printing money.
IMEC is a lab, not a fab. They have partnerships with all major fabs for driving research forwards and making prototypes and concepts, but they don't manufacture anything there, it's still up to Samsung, Intel or TSMC to try out whatever IMEC comes up with.
They may have lasers, electron microscopes, probes, etc on-site for testing what Intel or TSMC ship them and verify research results, but that's pretty far away from a "cottage industry".
Intel and Samsung are the true "cottage industries" as they do full vertical integration of IP, R&D and manufacturing under the same roof.
IMEC is more like the UN of semi companies, a place for them to come together, talk, share knowledge and results and decide industry standardisation based on that.
Accelerating the process is an incredibly obvious desire for literally everyone in the industry and there are already gobs of money being put into R&D.
The fact of the matter is that we're dealing with physical and chemical processes. It simply takes time for atoms to move across space. In many steps of the semiconductor fab process we are literally building up the chip by single-atom thick layers.
There's very finite limits to how fast you can throw atoms at a substrate. There are finite limits to how much time a photoresist must be exposed. There are finite limits to how fast chemicals can etch the surface. You can only saw a wafer so fast, you can only physically transport dice through space so fast.
These are problems that the entire industry wants to solve. These are problems at the bleeding edge of physics. This is not something a startup is going to solve, purely because you need to already have an entire semiconductor fab to iterate in.
There are startups trying to reduce the time/cost of chip-fab. Atomic Semi is one. Most of the big players have shuttle services which one can use to put a small chip on for fabrication for not a lot of money ~ $10k for a few chips. Tiny Tapeout acts as an OshPark or JLC PCB for chip design using 2 different fabs - skywater's 130nm and IHP's 130nm node. Wafer.space uses Gobal Foundries 180nm (~ $7 per 20mm^2 chip).
At the end of the day the industry has evolved and optimized to stamp out millions of chips a year with very little defects. However that requires a fairly hefty upfront commitment in design/verification time along with a capital commitment to make enough chips to get economies of scale. This breaks down when trying to build small volumes of chips quickly. It's like trying to turn an oil tanker quickly....
When you say that the 125 added memory management, what does that mean a little more specifically? My guess is either PDP-11/z280 style paging without page tables (the 16 bit address space makes just having enough IO registers to cover the address space tractable) or some simple segmentation hardware, but it'd be neat if there was another hardware object capability system I didn't know about.
The Mitra 125 had memory segmentation. It seems similar to the 8086, with descriptors that specified base address and length for a segment. Accessing memory outside a segment caused a trap.
Not just the TLB, but the L1 D$ will be very unhappy as well. All heap objects being page aligned on most microarchs ends up making every object start at cache set 0 because the set determination ends up being indexed off of the offest within a page so that the TLB lookup can happen in parallel with the set load.
It seems like it took engineering work, but TLS isn't their bottleneck when the data flow is structured correctly for the hardware (which is kind of the thesis of a lot of the Netflix CDN node optimization stuff).
> I don't get why RockChip doesn't budget the money in the business plan to fund full driver support for at least some of their more capable chips. I guess maybe too many of these chips are used in non-OS contexts to be worth it?
They have drivers in most of these cases; at a bare minimum the silicon was tested by the DV teams, and that generally includes running drivers.[0]
The issue is getting drivers upstreamed rather than just languishing in the vendor BSP.
And the answer for why they don't get upstreamed by the vendor is multifaceted. First off, the drivers in the vendor BSP are simply not at a quality level that would be accepted upstream. On top of that, even if they were at the quality needed, practically that coordination with upstream is a decent amount of work. Additionally, their customers don't really even care about upstream in the vast majority of cases, but instead prefer some vendor outdated fork billed to them as "stable".
[0] Apple for instance is rumored to have an internal Linux distro (or at least kernel fork) for DV of their Apple silicon chips to allow the hardware teams and macos teams to work with fewer cross department dependencies.
> First off, the drivers in the vendor BSP are simply not at a quality level that would be accepted upstream.
You're quite right, morally and practically. I can't help but wonder though, if the like of Rockchip or other big faceless chipmakers released whatever inadequate source they had, that it wouldn't somehow end up in a nice upstream high-quality driver.
Isn't driverkit essentially a separate user space stack compared to regular code? I remember seeing the driverkit specific dyld caches in macos root partition images that included their own copies of everything down to libsystem. Getting driverkit code to run in the same process as normal user code seems like it'd be quite an uphill battle.
Presumably with the right entitlements you can just hit the same (presumably IOKit) syscalls that driverkit does. But that's an extra layer of reverse engineering, and you're not really using driverkit anymore.
it is a separate stack, but that probably doesn't matter much. a user process (in my case, qemu) can communicate with a driverkit driver. the user process can also map memory through the driver, which is how this pci passthrough system works.
i don't think the issues with the project really are specific to driverkit.