So you want us to just stop fixing bugs and pushing out those fixes to users? That feels risky, if you do not want to upgrade to solve known problems, that's fine, feel free to skip upgrades. But why would you want to prevent those who want to run secure systems that ability?
The answer doesn't have to be so tetchy as your ire puts it, it's as simple as reducing releases to once a week or once every other week. I respect your work greatly, but you've taken the "throw the baby out with the bathwater" approach to your answer here; two Production LTS kernels a week seems excessive to me.
Why is it "excessive"? We are running 30+ fixes a day in these kernel releases, who would benefit if we delayed in getting those known-bug/security fixes out to the world quickly and properly tested (as we are currently doing)?
Why wait? What is a slower cadence going to accomplish?
If you only want to upgrade your kernel once a week, then you are free to do so regardless of how many releases they make during the week. Unless they've been introducing regressions by releasing too aggressively, there's no upside to releasing less often.
Where in the current CI that we have today is lacking that needs to be improved? We always want more testing and testers, what is preventing everyone from helping with this?
I'm a software engineer who's not involved in Linux Kernel Dev... but I've got a stack of old laptops that I'd be happy to set up to run automated CI if that'd be helpful.
Is there a webpage or doc somewhere I can look at?
(I'm not trying to snark - the fact that you're you and you're here asking for help is making me want to dip my toe in).
Simplest thing to do, just run Linus's latest releases (the -rc releases), or from his git tree, on your machine and report any problem.
Second-simplest thing to do is to run the linux-next branch/tree on your machines and report any build warnings and runtime issues you find. That's what will be the "next" kernel releases and is where all of the developer/maintainer trees are merged together before they are sent to Linus.
Both of those should be very easy to do, and any problems found there should be easy to fix and resolve before they get to a "real" release.
I haven't been following kernel dev for years; what does the CI setup look like? Did the Phoronix Test Suite ever find its way into widespread use?
Back when I was building kernels for embedded hardware (Sheevaplug) in the 2.6.33 timeframe, I found a USB audio regression between 2.6.33.7 and later versions. If there were a semi-turnkey way to set up a testbench that could automatically reboot hardware in every new kernel, run through some basic tests, and report any deviation, I probably would have been more likely to do so. At the time I was working solo trying to release a polished consumer product (sadly though the product was released the business didn't work out) and didn't have time to dig into and report bugs.
We have so many different CI systems running on the kernel on a hourly basis.
We have the 0-day bot from Intel that runs so many things on all developer trees. We have kernelci running on many many different hardware platforms, and we have Linaro test systems also running on many different branches and hardware platforms.
If you want to tie your own hardware into the system, kernelci is the best place to start, I recommend looking into that.
These are some attitude goals for me. It's so easy to take things personally. Being able to take things constructively even when they might be personal is a great skill.
Everyone gets older, the alternative isn't as attractive :)
Seriously, the kernel averages about 200-250 new contributors every release (i.e. every 2 1/2 months). We are not starved for new contributors at the moment at all, do you think we are somehow not attracting new developers compared to other open source projects?
I was mainly referring to something I read many years ago (regarding new kernel devs), e.g., this from 2013 [0]. However, from your response, looks like that's not a problem.
I’m not sure that _this_ is the (a?) problem, but if someone were purely sourcing CNCF / Linux foundation / press releases they might think the project is heading for a day when the old guard keels over and we’re left high and dry.
That "partially-ABI stable" is the same exact thing that Red Hat and SUSE and Debian have been doing for 20+ years now. Nothing major and exciting there, but see the presentations at the Linux Plumbers conferences for details on the tools being used if people are curious (hint this time everyone is working together on the same set of tools...)
With the current rate of change that the kernel community develops at, including the patches backported to the stable/longterm kernels, it's impossible to try to evaluate each and every patch for "is this something that could be exploited or not?"
Companies have tried, it was fun watching them, but they quickly gave up and declared it impossible and much safer to just take all stable patch updates instead.
I've also talked to MITRE about just applying for a CVE for ever stable kernel patch (20+ a day), and while they appreciated me not doing that, they agreed that the current model of CVEs just does not work at all for the Linux kernel and that what we are doing is fine.
See my Kernel Recipes talk last year for details about all of that if you are curious.
I understand that completely and i already know your thoughts on that and in some extend i do agree.
Still, i think we do have in hand a very characteristic issue that even without knowing the details, simply by searching commit messages for "crypto" "key" "buffer" etc it should alert somebody to give it a second and third look.
If there is a commit that refers to a "memory leak" why shouldn't be, at least superfluously, checked, identified and have distros informed? (e.g 2ca068be09bf8e285036603823696140026dcbe7)
If the crypto fix was assigned early as a vulnerability would have stayed unpatched for that long?
With no marking it is clear what it means: commits have not been audited to identify security-relevant ones.
With partial, incomplete marking, unmarked commits can be one of two things: commits that have not been looked at, and commits that have been looked at and are believed to contain no security relevant changes.
The majority of commits will be in the "not looked at" category. And there's enough people around to have a significant subset of them be lazy, ignorant, unskilled or stupid and take that as "contains no security relevant changes."
P.S.: also, patches are already marked. By being included in the LTS series. Because that means they were important enough to get a backport — though not necessarily due to security impact.
I do agree with the premises, i don't agree with your conclusion.
Yes only a part of patches would be marked as such.
That, major or minor, part would simply mean that people won't have to reinvent the particular wheel, as happened in this case.
People won't be missing critical _discovered_ changes, the vulnerability would be discussed, recognized in its totality (PoC, documentation etc) and proper patches will be offered. There have been cases where LTS backports were old revisions of bad patches.
I think that baking LTS kernels is unnecessarily closer to an artistic approach of doing things.
I'm certain that Red Hat and our kernel developers have no animosity towards you, in fact it's completely the opposite. You're well known in the community not just for this but for maintaining and writing countless drivers and loads of other great work in the kernel.
> They feel like they know better and do not want all of the fixes that the LTS kernels provide for some crazy reason.
It's even crazier; they sometimes backport changes to their kernel that the LTS kernels don't get. We use a custom kernel module that contains a bunch of #if #endif blocks that check the kernel version for stuff that changed. Doesn't work on RedHat since you actually need the branch that's for more recent kernels in some places.
There would be cleaner ways to achieve this, maybe not specifically autoconf since I think that's more tailored towards "normal" (user space) stuff.
Macros are convenient to quickly check the version in your code without adding another layer of tooling... Until you end up with said macro soup of course.
It's actually a legacy module we're about to phase out for 5.x so don't worry too much. The new and shiny replacement will probably use git branches for whenever something changes.
It could also be motivated by the fact that it's not entirely wise to base an entire multibillion dollar business around whatever text a non-employee happens to push to a repo you don't control.
Android doesn't seem to mind, they require the LTS updates to be taken for their devices (well, "require" is a strong word, they are pushing harder now than they were in the past, "required" will be happening in the future, hopefully...)
As the number of systems running RHEL is really just a rounding error compared to the number of Android systems out there, maybe it doesn't really matter :)
Android and RHEL are completely different scenarios.
The target of Android, which is who Google has to deal with, is multiple manufacturers creating kernels for custom hardware (often without upstream drivers), with very short product life, relatively little experience with upstream contribution and few needs for new features for a given major release of Android.
RHEL is developed by a single company with a 10 years life cycle, only 3-4 kernel versions to juggle but almost non-overlapping lifecycle (as far as the initial development-heavy phase is concerned). Development occurs upstream first and quite a few engineers are upstream developers or maintainers, so that the number of non-upstream features is very small and almost going down over time, see for example stuff like the secure boot lockdown patches that Matthew Garrett started when he was at Red Hat. And even though the product is not the kernel, we need to backport more features than what goes into LTS, because userspace needs them (user namespaces, driver updates, networking or virtualization optimizations, enablement for new processors, etc.)
So it's only natural that there are completely trade-offs to make.
LTS' security record is definitely not to be proven. The example we're currently commenting on this thread is only one occurence. It is highly re-occuring.
RHEL fixes only CVEs. Linus Torvalds consider there is no such thing as a security bug, that actually every bug is a security bug. So RHEL's kernel can't be secure.
Debian also uses LTS-series kernels for their stable release - with their own patches on top. They don't actively backport features like RHEL does however.