The reason why virtualization approaches with true Linux kernels is still important is what you do allow via syscalls ultimately does result in a syscall on the host system, even if through layers of indirection. Ultimately, if you fork() in gVisor, that calls fork() on the host (btw fork() execve() is expensive on gVisor still).
The middle ground we've built is that a real Linux kernel interfaces with your application in the VM (we call it a zone), but that kernel then can make specialized and specific interface calls to the host system.
For example with NVIDIA on gVisor, the ioctl()'s are passed through directly, with NVIDIA driver vulnerabilities that can cause memory corruption, it leads directly into corruption in the host kernel. With our platform at Edera (https://edera.dev), the NVIDIA driver runs in the VM itself, so a memory corruption bug doesn't percolate to other systems.
> Ultimately, if you fork() in gVisor, that calls fork() on the host
This isn't true. You can look at the code right here[1], there is no code path in gVisor that calls fork() on the host. In fact, the only syscalls gVisor is allowed to make to the host are listed right here in their seccomp filters[2].
I was more specifically referring to the fact that to implement threads in gVisor, it calls to the go runtime, which does make calls to clone() (not fork()), but I see the pushback :)
I think it's a small distinction. fork() itself isn't all that useful anyways.
However, consider reading a file in gVisor. This passes through the IO layers, which ultimately will end up a read in the kernel, through one of the many interfaces to do so.
This is a big reason for our strategy at Edera (https://edera.dev) of building hypervisor technology that eliminates the standard x86/ARM kernel overhead in favor of deep para-virtualization.
The performance of gVisor is often a big limiting factor in deployment.
I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.
I'd say the limitation has been that sometimes we have to implement things by hand. But it has enabled us to do things that others can't achieve since KVM is a singular stack in many ways. For example, VFIO-PCI is largely the same across all VMMs, but we have true full control over the PCI passthrough on our platform which has allowed us to do things KVM VMMs can't.
Hi! I’m the CTO of Edera and discovered this bug with my colleague Steven!
The story of this bug is interesting. We were both up late at night working on GPU support on the Edera platform, and we had just pulled an NVIDIA container image. What should have resulted in a temporary directory of tar files for OCI layers was filled with NVIDIA library files! We were both super confused until I had an “oh god no” moment and realized this happened.
We kicked right into action on responsible disclosure.
I can answer any questions, but I want to send a huge thank you to our team for working together on this and to Astral for being wonderful to work with!