Before LLMs, it was cheaper in the long run; by upstreaming your patches you don't have to rebase them continually and sometimes the community will maintain the code for you. OTOH sometimes you might need to work on the code again though as other parts of the project evolve if the project is likely to throw out unmaintained code; this is especially true in the Linux kernel where internal APIs change constantly, but upstream maintenance is probably cheaper than continually backporting security fixes to your stable/LTS/SLTS or completely dead versions.
With LLMs the costs might be different but will still exist.
I've been working on a WebExtension that calls out to zygolophodon and returns plain HTML to the browser. In the process of rebasing it over recent changes but here is the working webext-old branch:
The SFC (the only GPL enforcers at the moment) disagree; they say that both GPLv2 and GPLv3 require the ability for users to install modified versions on their devices.
The point of the talk is it is non-trivial to detect those dependencies.
It looks like most of the time was spent discussing Python. I suspect that is because it is possible to create software without an explicit build stage, so you would not receive warnings about a dependency until the code is called. If the software treats it as an optional dependency, you may not receive any warnings. This sort of situation is by no means unique to interpreted languages. You can write a program in C, then load a library at run time. (I've never tried this sort of thing, so I don't know how the compiler handles unknown identifiers/symbols.) Heck, even the Linux kernel is expected to run "hidden packages" (i.e. the kernel has no means of tracking the origin of software you ask for it to run).
Yes, you can write software to detect when an inspected application loads external binaries. No, it is not trivial (especially if the software developer was trying to hide a dependency).
And just a quibble: even bootstrapping requires the use of a binary (unless you go to unbelievably extraordinary measures).
Dependency detection is usually done during source code review for software packaging (like Debian), there it is relatively trivial; look at declared dependencies, search for language functionality that loads libraries or calls executables and mostly you will be done. Like dlopen for C or import for Python.
The Linux kernel has the IMA subsystem that is intended to prevent executing untrusted binaries, enroll all the hashes from your package manager, and then you will know where every executed binary came from. Or just verify the block device with dm-verity. Or both. I believe that similar functionality exists on Windows and some interpreters have support for asking the kernel to check if files can be executed before loading them.
The Bootstrappable Builds toolchain requires the use of machine code of course since CPUs only accept machine code, but that machine code is in hex numbers in a text file with comments and that form is considered the "source code" not "a binary", aka it is "the preferred form for modification" (the phrase used by the GPL). The human starting the bootstrap process has to review it is correct, enter it into the computer in some trustworthy way, and start it. Yes, the bootstrap process does go to unbelievably extraordinary measures :)
Does Gentoo use the Bootstrappable Builds process yet? ISTR someone was working on it.
Windows/macOS/etc are pretty irrelevant if you don't want to trust binaries, because most of them don't come with full source code. People who care about this stuff aren't even going to consider proprietary platforms.
In any scenario where you would do a full-source bootstrap, you would be reviewing the code for each step of the process, or deciding which reviews published using crev or similar are trustworthy enough for you.
reply