more cesarb's favorites

> Can't we just have a peaceful life without wasting time on constantly following and analyzing every single move from these companies?

Not if you're using Microsoft products, no.

People continue to get irritated when "we" do this, but here I go: you should be running Linux exclusively on your personal computers. You should also stop buying "smart" shit.

> I bet that WhatsApp is one of the rare services you use which actually deployed servers to Australia. To me, 200ms is a telltale sign of intercontinental traffic.

So, I used to work at WhatsApp. And we got this kind of praise when we only had servers in Reston, Virginia (not at aws us-east1, but in the same neighborhood). Nowadays, Facebook is most likely terminating connections in Australia, but messaging most likely goes through another continent. Calling within Australia should stay local though (either p2p or through a nearby relay).

There's lots of things WhatsApp does to improve experience on low quality networks that other services don't (even when we worked in the same buildings and told them they should consider things!)

In no particular order:

0) offline first, phone is the source of truth, although there's multi-device now. You don't need to be online to read messages you have, or to write messages to be sent whenever you're online. Email used to work like this for everyone; and it was no big deal to grab mail once in a while, read it and reply, and then send in a batch. Online messaging is great, if you can, but for things like being on a commuter train where connectivity ebbs and flows, it's nice to pick up messages when you can.

a) hardcode fallback ips for when DNS doesn't work (not if)

b) setup "0rtt" fast resume, so you can start getting messages on the second round trip. This is part of noise pipes or whatever they're called, and tls 1.3

c) do reasonable-ish things to work with MTU. In the old days, FreeBSD reflected the client MSS back to it, which helps when there's a tunnel like PPPoE and it only modifies outgoing syns and not incoming syn+ack. Linux never did that, and afaik, FreeBSD took it out. Behind Facebook infrastructure, they just hardcode the mss for i think 1480 MTU (you can/should check with tcpdump). I did some limited testing, and really the best results come from monitoring for /24's with bad behavior (it's pretty easy, if you look for it --- never got any large packets and packet gaps are a multiple of MSS - space for tcp timestamps) and then sending back client - 20 to those; you could also just always send back client - 20. I think Android finally started doing pMTUD blackhole detection stuff a couple years back, Apple has been doing it really well for longer. Path MTU Discovery is still an issue, and anything you can do to make it happier is good.

d) connect in the background to exchange messages when possible. Don't post notifications unless the message content is on the device. Don't be one of those apps that can only load messsages from the network when the app is in the foreground, because the user might not have connectivity then

e) prioritize messages over telemetry. Don't measure everything, only measure things when you know what you'll do with the numbers. Everybody hates telemetry, but it can be super useful as a developer. But if you've got giant telemetry packs to upload, that's bad by itself, and if you do them before you get messages in and out, you're failing the user.

f) pay attention to how big things are on the wire. Not everything needs to get shrunk as much as possible, but login needs to be very tight, and message sending should be too. IMHO, http and json and xml are too bulky for those, but are ok for multimedia because the payload is big so framing doesn't matter as much, and they're ok for low volume services because they're low volume.

Reminds me of a UX course I took many years ago at uni.

As an exercise, we were asked to come up with a solution to help people navigate campus. There were so many suggestions for apps or interactive touch screens. Someone suggested installing terminals where you type where you want to go, and then the floor lights up with directions. Someone else did the same, only it would launch a drone for you to follow.

I suggested hanging printed paper maps on the walls with "you are here" stickers.

This is one of my tricks when evaluating products going through some sort of transition from a legacy to a modern standard:

I mentally swap the labels[1] when reading their marketing or release notes. If it says things like "No longer crashes when using IPv6", then I flip that and read it as "No longer crashes when using IPv4". That latter statement is absolutely insane and would have you abandoning that vendor as fast as you can tear up the contract. Nobody bats an eye with the first statement! Why not!?

Azure literally had this scenario, where merely enabling IPv6 on one network would crash their managed PostegreSQL service on another peered network with no recourse other than rebuilding everything form scratch without IPv6 to roll the change back!

[1] Or a generic placeholder than encompasses both. E.g.: "Fix: Enabling the internet protocol no longer causes irreversible outages with your database service."

> Their "copilot" brand is so weird and... muddled.

Microsoft can't brand anything cleanly and unambiguously.

"MSN Messenger" / "Windows Messenger" / "Windows Live Messenger" / "Microsoft Lync"

"Internet Explorer" / "Windows Explorer" / "MSN Explorer"

Windows 95 email client "Exchange" / email server platform "Exchange"

"Outlook" / "Outlook Web Access" / "Outlook Web App" / "Outlook.com" / "new Outlook for Windows"

"Microsoft Teams" / "New Microsoft Teams"

"Office Communicator" / "Microsoft Lync" / "Skype for Business" / "Skype" / "Skype for Business Online" / "Skype for Business for Microsoft 365"

The most guffaw-inducing branding, to me, was the recently-announced remote desktop client called "Windows App". That's going to be an easy one for users to search for.

(For guffaw-inducing I suppose there's also the Windows 98-era "Critical Update Notification Tool"[0])

[0] https://en.wikipedia.org/wiki/Windows_Update#Critical_Update...

(Edit: Yikes. I didn't even consider .NET. Windows.NET server. .NET Framework. ASP.NET. .NET Core. Ugh...)

More editing because I can't stop myself:

"Great Plains" / "Navision" / "Solomon" / "Axapta" / "Dynamics AX" / "Dynamics GP" / "Dynamics SL" / "Dynamics NAV" / "Dynamics 365" / "Dynamics 365 for Finance and Operations" / "Dynamics 365 Business Central"

More editing because I was egged-on... >smile<

"Windows Defender" / "Microsoft Defender" / "Windows Defender Antivirus" / "Windows Firewall" / "Windows Defender Firewall" / "Microsoft AntiSpyware" / "Microsoft Security Essentials" / "System Center Endpoint Protection"

Oh, ugh... then there's the whole "Microsoft Proxy" / "Forefront" / "Federated Identity Manager" nightmare.

Then there's "System Management Server" / "System Center" and that whole train of products.

Edit: Forgot SharePoint

"Microsoft FrontPage" / "Site Server" / "Site Server Commerce Edition" / "Office Server" / "SharePoint Portal Server" / "Windows SharePoint Services" / "Microsoft Office SharePoint Server" / "SharePoint Foundation" / "SharePoint Server" / "SharePoint Standard" / "SharePoint Enterprise" / "SharePoint Online" / "SharePoint Designer"

Google own vast network infrastructure. The day Google acquired YouTube (I was there) they discovered that YT was on the verge of collapse and would run out of bandwidth entirely within months. There was an emergency programme put in place with hundreds of engineers reallocated to an effort to avoid site collapse, with the clever/amusing name of BandAid.

BandAid was a success. YouTube's history might look very different if it wasn't. It consisted of a massive crash buildout of a global CDN, something that Google historically hadn't had (CDNs aren't useful if everything you serve is dynamically generated).

One reason BandAid could happen was that Google had a large and sophisticated NetOps operation already in place, which was already acquiring long haul unused fibre wavelengths at scale for the purposes of moving the web search index about. So CDN nodes didn't need to be far from a Google backbone POP, and at that point moving bits around on that backbone was to some extent "free" because the bulk of the cost was in the fibre rental and the rackspace+equipment on either end. Also it was being paid for already by the needs of search+ads.

Over time Google steadily moved more stuff over to their infrastructure and off YouTube's own, but it was all driven by whatever would break next.

Then you have all the costs that YouTube has that basic video sites don't. ContentID alone had costs that would break the back of nearly any other company but was made effectively mandatory by the copyright lawsuits against YouTube early on, which Google won but (according to internal legal analysis at least) that was largely due to the enormous good-faith and successful effort demonstrated by ContentID. And then of course you need a global ad sales team, a ton of work on ad targeting, etc.

This story explains why Google can afford to do YouTube and you can't. The reality is that whilst they certainly have some sort of internal number for what YouTube costs, all such figures are inevitably kind of fantastical because so much infrastructure is shared and cross-subsidised by other businesses. You can't just magic up a global team of network engineers and fibre contracts in a few months, which is what YouTube needed in order to survive (and one of the main reasons they were forced to sell). No matter what price you come up with that in internal booking it will always be kinda dubious because such things aren't sold on the market.

> obviously, you'll always be able to get things "through" a filter like this. But the value of raising the bar of the exploit is still quite substantial

I just want to stress this part.

So many people I talk to will just dismiss things because something isn't bullet proof. Like there's a binary option. But in reality there's a continuum. I'm the annoying person that tries to get my friends to use Signal, but then say if you won't install, that WhatsApp is my next preference. People on Signal forums will say that you shouldn't have the ability to delete or nuke conversations (now you can delete some, but only if <3hrs old) BECAUSE you can't guarantee the message content wasn't copied. Which is just fucking insane. It's not incorrect, but you have to think of things probabilistically and security is about creating speedbumps, not bullet proof vests. It is standard practice in many industry settings to remotely wipe a device (and then operate under the assumption that the data was leaked) because if you don't, adversaries have infinite time to copy that data rather than finite.

In most things, there are no perfect solutions. We have to think probabilistically and the tradeoffs for different environments (which are dynamic). Trying to make perfect solutions are not only unachievable, but even if they were they wouldn't last for long.

As far as I am aware, Java is the only major programming language (other than JavaScript, which is also the only worse major language) that was not organically adopted by programmers.

JavaScript was forced on the world by being the language of the browser, but Java was foisted on the world by Sun Microsystems in a massive marketing campaign.

And then Java was bought by the kind of person who isn't a programmer but needs to make some kind of sensible choice at some company, along with the kind of person who needs to teach freshman programming according to the latest fad.

Don't forget the millions Sun spent literally ADVERTISING Java.

For 2-3 years (2003-4?) every other tech-related book that was published had something to do with Java. I remember going into a Barnes and Noble once, back in that era, and walking down an aisle that felt like it was 30 feet long and four shelves high of just Java books. It was all marketing.

After a decade people suddenly woke up and realized "oh, Java sucks".

Then the smart kids moved on, but the rest of the world is now stuck with Java, and there will always be those kinds of people around who aren't programmers but who need to make what they think is some kind of sensible choice.

Of course, the Java community has also realized what a pile of ** Java was, so now they've added all sorts of lambdas and better syntax and whatnot, but it's band-aids on top of a fundamental misconception, which is that object-oriented programming is the best way to model software engineering problems.

RFC 9413 “maintaining robust protocols” https://datatracker.ietf.org/doc/rfc9413/ was originally titled “The Harmful Consequences of the Robustness Principle (draft-thomson-postel-was-wrong)” https://datatracker.ietf.org/doc/draft-thomson-postel-was-wr... and it has several examples

I sold your city a bridge that has an engineering flaw, such that it collapses when a purple van followed by an orange scooter drives over it.

It also has thousands of other serious engineering flaws, for other combinations of vehicle and foot traffic.

I keep the bridge blueprints secret, so that only I can make patches for the engineering flaws.

But I'll slowly trickle patches to some of the thousands of engineering flaws. (Mainly when they result in collapses to bridges I've built in other cities.)

Each time, I'll act like I'm doing you a favor, patching some of the bridge engineering flaws that I made negligently.

And I'll start bundling concessions and invasiveness with these patches.

The patch for the design flaw that makes the bridge collapse when green polka-dot truck drives over it-- comes with a garish billboard that I control.

The patch for the design flaw that makes the bridge collapse when a convertible with the windshield wipers on drives over it-- gratuitously requires that everyone driving over it lets me copy all their documents and photos.

It doesn't help that your city has some really jerky enemies, who like to make bridges collapse, and have lots of time on their hands. (Maybe you should've thought of that when selecting an engineering firm for your bridge.)

Somehow, I don't get jailed for shoddy bridge engineering, nor lose my bridge engineering license, nor even have to pay compensation for any of the collapses of bridges that I sold to other cities.

Windows also had FAT16 -> FAT32 and FAT32 -> NTFS converters. In the 90s. But nobody produced blog posts praising them for them, this was just a Windows feature.

Little of column a, little of column b. Yes, nostalgia is a factor, BUT:

- The lack of overhead required by security and 'features' like querying an internet search when you click the start menu or showing advertising in the calculator or better memory management in general, meant that overall UI response in that era used to be much _MUCH_ faster than it is today. Once upon a time you could operate your O/S with the speed of a Starcraft tournament winner; this simply is not possible any more.

- You could 'queue' commands - clicking the close button on an program and then (before it had finished closing) clicking the minimise button would minimise the program behind it immediately after the closing process finished. In this way you could chain/queue commands rather than being forced to wait for the OS to update between each step.

- Enter and Spacebar did different things. If a prompt had two buttons, one would be outlined with a thick black line that would respond to [Enter], and one would be outlined in a dotted line that would respond to [spacebar]. This is still the case sometimes but is far from ubiquitous.

- The top-left corner of the program was reserved for a 'system' menu used to move/resize the window, or quickly exit with a double-click. Though still used by some MS programs today like Explorer, its usefulness is lessened if not all programs utilise it.

- Don't even get me started on keyboard shortcuts.

These kinds of universally-accepted and _useful_ power-user-oriented design principles are almost absent from UX as it is implemented today.

The UIs of the 80s and 90s were designed to be learned, and carefully refined through focus testing. These UIs used consistent visual affordances, and contained contextual help. Constraints on memory, resolution, and color depth discouraged the inclusion of visual elements that did not contribute directly to usability and functionality.

The UIs of today are largely designed by people who have experienced GUIs for their entire lives, and assume that everyone is already familiar with conventions. Focus testing is seen as slow and expensive, so designers lean on A/B testing and telemetry, randomly breaking live user experiences in small batches to creep toward local maxima. Needing a help system is viewed as old-fashioned; users should paw at UIs like a puzzle box to discover features. Computers and displays are powerful enough for every application to be a unique "branded experience".

> Would it be fair to say that the perpetrators could have covered their tracks better? Could they for example, have fixed the valgrind errors? And if so, would this backdoor have remained hidden for much longer?

Yes. Mostly they should have reduced the cost of starting up sshd with the backdoor. A lot of that seems to be due to all the symbol lookups they needed to do, while staying obfuscated. It feels like they started with a reasonable set of features and then just piled on more and more, leading to the noticeable cpu usage.

I think the valgrind warnings were only triggered when using -fno-omit-frame-pointers. Which, at the time they wrote this stuff, wasn't the default anywhere. They got unlucky in that Fedora changed to default to that and that I happened to have that set in my valgrind tests.

> What was the moment like, when you realized you have stumbled upon a backdoor? I mean, it is riveting just to read the various reports of this backdoor!

It was many hours of slowly figuring that out, room for different emotions. Lots of nervous cackling. Thinking I must just be hallucinating. Worry about how to deal with this. And more...

Edit: Grammar

In decades past, if you were developing software for Unix you would inevitably run across compatibility problems. Even with the POSIX standard, every Unix was trying new things, inventing new things, or just implementing the same thing in slightly different ways. So your programs had to have a bunch of macros to switch between different variations. Some API would have two arguments on Unix A but three on Unix B, so you would have to turn that api call into a macro that would insert the missing third argument or whatever. But now your users needed to know which configuration of your software to use. You could try documenting all the little knobs and switches and make them choose the correct values, but someone hit on a clever idea.

Just write a little program that tries calling the API with two arguments, and see if it compiles. If it does, you could automatically define THAT_API_HAS_TWO_ARGS and your code will do the right thing. If it fails to compile, then define something else.

So you start writing a shell script that will run through all of these tests and dump out a header containing all of these configuration choices. Call the script `configure` and the header `config.h` and you are good.

Except that actually writing a shell script that is really actually POSIX correct is pretty hard. And the code is pretty repetitive. Mistakes start to creep in, bashism break things when it’s actually run under `sh` and so on. And it turns out that all the little tools we rely on, like find and grep and sed and awk, are all different on different platforms too! Turns out that nice option you were passing to grep is just an extension, only present on your specific flavor of unix.

So the next clever idea was to write a program that generates the configure script. The repetitive bits can mostly be solved by text substitution, so someone had the bright idea to use M4. Thus autoconf was born.

Autoconf solved a lot of problems. It ensured that the configure script was actually POSIX correct. It did this even though the generated script was even more complex than ever, with extra logging and tracing and debug output that most people never look at, but which is actually super useful for debugging. The configure script you had written by hand didn’t have any of that!

And autoconf came with baked–in knowledge about all the things you might need to test for. Any time someone discovered some new difference between Unices, new tests were added to autoconf for it. Autoconf users hardly had to do any more work than to look up the name of the macro that it would define for you.

Of course, M4 causes actual brain damage in regular practitioners, and nobody cares to write software for any Unix except Linux these days. Even the BSDs are an afterthought. Technically OpenSolaris is still out there, with superior features like ZFS and Zones and whatnot, but that’s a lot of extra work. So autoconf is a dinosaur, solving problems that nobody really has any more.

Rachel is not wrong that it’s crazy to rerun the same configure script over and over, but really only developers have to do that. Most people just run the thing once when they install your software, and never see it again. And most people don't even do that any more, because they download binaries from their package manager most of the time. It’s just the distro packagers that actually run your configure script.

And she’s right that the answers configure gets should be cached system–wide. Autoconf does support that, but it’s harder to do correctly than she remembers. And by the time it was possible usage was already waning in practice.

The level of complexity involved in making sure that electrical plants work, that water gets to your home, that planes don't crash into each other, that food gets from the ground to a supermarket shelf, etc, is unfathomable and no single person knows how all of it works. Code is not some unique part of human infrastructure in this aspect. We specialize and rely on the fact that by and large, people want things to work and as long as the incentives align people won't do destructive things. There are millions of people acting in concert to keep the modern world working, every second of every day, and it is amazing that more crap isn't constantly going disastrously wrong and that when it does when are surprised.

I think one of the good things to come out of this may be an increased sense of conservatism around upgrading. Far too many people, including developers, seem to just accept upgrades as always-good instead of carefully considering the risks and benefits. Raising the bar for accepting changes can also reduce the churn that makes so much software unstable.

Thanks for responding. My experience with tracing GC at scale is exclusively in the realm of .Net, and RC exclusively with C++ smart pointers. That matches your “sophisticated vs crude” contrast.

The experience with .Net is that GC impact was difficult to profile and correct, and also “lumpy”, although that may have been before GC tuning. GC would dominate performance profiling in heavy async code, but these days can be corrected by value tasks and other zero alloc methods.

For C++ style ref counting, the impact was a continuous % load and simple to profile (and therefore improve). Although here, the ref counting needed to be stripped from the hot paths.

The biggest issue I’ve hit between the two modes though, is how they behave when hitting memory limits. Tracing GC appears to have an order of magnitude perf hit when memory becomes scarce, while ref counting does not suffer in this way. This is enough for me to personally dislike tracing GC, as that failure state is particularly problematic.

The problem is not only how long the pause takes but also the fact it pauses all the things. In manual memory management even if you have to spend some time in allocation / deallocation, it affects only the allocating thread. A thread that doesn’t allocate doesn’t pause.

The RCU use case is convincing, but my experience with GCs in other situations has been poor. To me, this reads more like an argument for bespoke memory management solutions being able to yield the best performance (I agree!), which is a totally different case from the more general static lifetimes generally outperforming dynamic lifetimes (especially when a tracing step is needed to determine liveness).

> Lies people believe... Calling free() gives the memory back to the OS.

I believe calling `free()` gives the memory back to the allocator, which is much better than giving it to the OS; syscalls are slow. Perhaps not immediately; mimalloc only makes frees available to future `malloc`s periodically.

Trying a simple benchmark where I allocate and then immediately `free` 800 bytes, 1 million times, and counting the number of unique pointers I get: glibc's malloc: 1 jemalloc: 1 mimalloc: 4 Julia's garbage collector: 62767

62767, at about 48 MiB, isn't that bad, but it still blows out my computer's L3 cache. Using a GC basically guarantees every new allocation is from RAM, rather than cache. This kills performance of any heavily allocating code; we don't care only about how fast memory management can work, but how quickly we can worth with what it gives us. I gave a benchmark in Julia showcasing this: https://discourse.julialang.org/t/blog-post-rust-vs-julia-in...

Malloc/free gives you a chance at staying hot, if your actual working memory is small enough.

Allocators like mimalloc are also designed (like the compacting GC) to have successive allocations be close together. The 4 unique pointers I got from mimalloc were 896 bytes apart.

My opinions might be less sour if I had more experience with compacting GCs, but I think GCs are just a vastly more complicated solution to the problem of safe memory management than something like Rust's borrow checker. Given that the complexity is foisted on the compiler and runtime developers, that's normally not so bad for users, and an acceptable tradeoff when writing code that isn't performance sensitive. Similarly, RAII with static lifetimes is also a reasonable tradeoff for code not important enough for more bespoke approaches. The articles example is evidently one of those deserving a more bespoke solution.

And eons ago, people made fun of me for using MS-DOS and loadlin as a bootloader for Linux. (This always worked fine for me, and MS-DOS was simple to make boot and install tools on with hardware of the time. A functional-enough MS-DOS installation took up a trivial amount of space on a garden-variety PC of the day, and it was easy to pare down.)

Fast forward 20 or 30 years: We're back to using a FAT[32] partition for booting, just with a very different mini-OS to do the job.

le sigh.

I worked on Apple’s ZFS port, and it was indeed licensing. Apple’s lawyers took a closer look at the CDDL when we were close to shipping and found some provisions that were unacceptable to them.

> Why choose async/await over threads?

I assume the intended question is: why use async/await over thread-per-client designs?

Easy: because async/await allows you to more easily compress the memory footprint of client/request/work state because there is no large stack to allocate to each client/whatever.

One can still use threads (or processes) with async/await so as to use as many CPUs as possible while still benefiting from the state compression mentioned above.

State compression -or, rather, not exploding the state- is critical for performance considering how slow memory is nowadays.

Async/await, manual continuation passing style (CPS) -- these are great techniques for keeping per-client memory footprint way down. Green threads with small stacks that don't require guard pages to grow are less efficient, but still more efficient than threads.

Threads are inefficient because of the need to allocate them a large stack, and to play MMU games to set up those stacks and their guard pages.

Thread-per-client simply does not scale like async/await and CPS.

People have been making this argument to me about Linux for more than 25 years. The most cutting version that I ran across was:

> Linux is only free if your time is worthless!

Something never quite sat right with me about this argument, and your comment finally made me understand what it is: the understanding you gain from tinkering is priceless, and it's exactly the experience that you use to help everyone around you: it turns you into an expert.

So yes, I may just want to turn the key and have my car work. But when it doesn't, I often wish I was that guy that had tinkered with my car, so I can better understand what was wrong, and whether I can fix it myself or if I needed a professional.

I run Linux on all my machines, and my family generally uses Mac (both sides), but all those years tinkering with Linux, they still come to me for help with their Mac machines that they insisted would Just Work.

All that out of the way, I agree with your fundamental premise: hackintosh is likely in the rear view mirror for the next generation of tinkerers.

I spent part of my childhood in France/Switzerland, and one thing I'll never forget is a scene of street construction in a French town. It's about 11:30am, and three guys all get done with the excavator and jackhammers. One guy goes and grabs something long from the bed of a truck, another heads to the cab and reaches for a brightly colored bundle, and the third guy grabs some chairs from the other side.

Right in the middle of the cordoned-off construction zone, the first guy sets up a folding table, the second guy neatly places a tablecloth on top along with a baguette and bottle of wine and some cheese etc., and the third guy brings up the chairs. These guys sat down for a nice meal for 90 minutes, at least, before getting back to work.

I think my American parents thought, "Wow, how in the hell does anything ever get built in this country?" while my thought was mostly, "that seems really nice, they look so relaxed!"

Hi,

co-author of the ODF spec (2006 ISO) here, I also got involved via NLNet to work with my local standards office to improve OOXML.

The OOXML spec was offered to the ISO standard along an unusual path, a fast standardization lane. Where there was no option for any of the people reviewing it to actually change anything. It was rubber-stamped in other words.

That fact alone doesn't mean it is a bad, but it is a bit of a red flag. The fact is, it is a really bad specification. It is a full 7000 pages long with lots of conflicting details. On top of that it is full of references like "this works like wordperfect version-n". While references are useful in specifications, they need to be to existing open standards to be meaningful. Wordperfect has never standardized its format, so referring to it is meaningless.

To implement a competing application that can use this format you'd not be able to do that from this specification alone. Next to that it is so massive that it is essentially an undertaking that makes no sense. Compare it to ODF which is 1/10th of the size. Has a lot of reuse of concepts and was written under OASIS, a standards organization, unlike the OOXML spec which was written by Microsoft and the full 7000 pages dropped on the world.

I stopped looking at the OOXML stuff for some years, so the next part may be outdated. I noticed that after MS got this ratified by OSI, and thus they dodged the threat of law requiring governments to switch to ODF, they never did an update to the spec even though the applications have seen plenty of new features.

>Google does have root on your device

Root access is meaningless on a user build of Android. Android is designed such that the root account is not needed and it is locked down using SELinux.

> to notify me about these silent installs, but it depended on a notification which is no longer broadcast on more recent Android versions

I built a test app on Android 14 and I was able to get the broadcast you are referring to. The changes which the app you linked has not adapted to is that since Android 8 you need to use Context.registerReceiver() to register an implicit broadcast receiver for those actions [1] and since Android 11 you need to use a <queries> tag (or query all permission) in order to get visibility of other apps on the system [2].

[1] https://developer.android.com/about/versions/oreo/background...

[2] https://developer.android.com/training/package-visibility

That's not a problem, that's a hangup you have.

We don't need a global definition of what a unit is, we just need to know what unit is being tested by a given unit test. This is important because it's how we understand coverage:

1. We need to understand what's within the unit (covered by the test) so we don't write redundant tests, and...

2. ...we need to understand what's not within the unit (not covered by the test) so we can write separate tests to cover that unit.

The unit can be anything, as long as we know what the unit we're testing when we read/write/run the test.

An integration test also needs to know what it's testing, but now you're looking at it from an outside perspective. You no longer care which specific areas of code (units) are within the integration, you just care what features are within the integration. Sometimes a refactor will mean that feature A no longer causes class B to call class C: that probably means changes to B's unit tests, because the code unit changed, but it doesn't mean changes to the integration test because the feature didn't change.

The important thing is to understand what's being tested by a given test.

For example, a project I'm working on has integration tests which are written with Selenium and run in a headless browser. Most of these tests involve users, but most of these tests aren't testing user creation, so I don't care how the users get created. Instead of going through the signup interface using Selenium, I'm just injecting the users into the database directly because that's faster and easier. The one exception is I'm not injecting the users into the database in my signup/login integration tests, because whether the user gets created is part of the feature under test.

Of course there are significant overlaps between these two categories, that is to say, there are some unit tests that are also integration tests.

> It's amazing to me that Microsoft messed up Windows.

It is. At Windows 7, they pretty much had desktops right. Everything mostly worked. They'd finally fixed the crashing problems. (How? The Static Driver Verifier validated that third-party kernel drivers would not crash the rest of the system, and a classifier applied to crash dumps routed similar crash dumps to the same maintainer.) The UI was reasonable for a desktop, and wasn't a clone of the mobile UI. No ads. Didn't phone home too much, and you could turn off auto-update. No issues with installing your own software.

Then Microsoft tried to make desktop look like mobile and tablet. Desktop began to look like a big phone, optimized for content consumption and fat finger selection. The result was something that was neither a good desktop nor a good content consumption device.

There's a misconception in the question that is important to address first: when an LLM is running inference it isn't querying its training data at all, it's just using a function that we created previously (the "model") to predict the next word in a block of text. That's it. When considering plain inference (no web search or document lookup), the decisions that determine a model's speed and capabilities come before the inference step, during the creation of the model.

Building an LLM model consists of defining its "architecture" (an enormous mathematical function that defines the model's shape) and then using a lot of trial and error to guess which "parameters" (constants that we plug in to the function, like 'm' and 'b' in y=mx+b) will be most likely to produce text that resembles the training data.

So, to your question: LLMs tend to perform better the more parameters they have, so larger models will tend to beat smaller models. Larger models also require a lot of processing power and/or time per inferred token, so we do tend to see that better models take more processing power. But this is because larger models tend to be better, not because throwing more compute at an existing model helps it produce better results.

HN For You