More

MertsA · on Aug 25, 2024

iMessage is not on the same playing field as Whatsapp and Signal. Apple has full control over key distribution and virtually no one verifies Apple isn't acting as a MitM. Whatsapp and e2e encrypted messenger force you to handle securely linking multiple devices to your account and gives you the option to verify that Meta isn't providing bogus public keys to break the e2e encryption.

https://engineering.fb.com/2023/04/13/security/whatsapp-key-...

For iMessage, Apple can just add a fake iDevice to your account and now iMessage will happily encrypt everything to that new key as well and there's zero practical visibility to the user. If it was a targeted attack and not blanket surveillance then there's no way the target is going to notice. You can open up the keychain app and check for yourself but unless you regularly do this and compare the keys between all your Apple products you can't be sure. I don't even know how to do that on iPhone.

MertsA · on Aug 25, 2024

Are you talking about Brian Steel? He was held in contempt because he refused to name his source that informed him of some misconduct by the judge (ex parte communication with a witness). That's hardly relevant here, the client wasn't involved at all as far as anyone knows.

https://www.reddit.com/r/Lawyertalk/comments/1dd32ji/brian_s...

themaninthedark · on Aug 25, 2024

It was records and communication that the lawyer made on behalf of his client.

More interesting is why the judge and prosecutors have not been referred to the bar for their illegal actions.

MertsA · on Aug 25, 2024

That in particular would fall under the crime-fraud exception so there's no attorney client privilege.

MertsA · on March 11, 2024

I'm literally going from a house in Seattle to an office in SF right now. I left 48 minutes ago, I'll update when I arrive and I'll personally add a data point here lol.

MertsA · on March 11, 2024

Made it door to door in 4h20m. Clearly the original person was taking the scenic route.

jrflowers · on March 11, 2024

Imagine the productivity gains if someone implemented a SEA - SFO Concorde flight

remram · on March 12, 2024

Or trains? With current (40-year-old) high speed train technology, this could be under 4 hours.

MertsA · on Feb 20, 2024

Disclaimer: I work at Meta and I know a couple of the authors of the paper but my work is completely unrelated to the subject of the paper.

That's not a technical error, they mean SRAM in the CPU itself. You're right about gcj but that's kind of a moot point when investigating some reproducible CPU bug like this. The paper mentions all the acrobatics they went through when trying to find the root cause but if gcj would have been practical then it also would have been immediately clear if the gcj output reproduced the error or not. If it didn't reproduce, no big deal, try another approach. You might be right about it being easier to root cause with gdb directly but I'm not so sure. Starting out, you have no idea which instructions under what state are triggering the issue so you'd be looking for a needle in a haystack. A crashdump or gdb doesn't let you bisect that haystack so good luck finding your needle.

SomeoneFromCA · on Feb 20, 2024

GCJs implementation could be so vastly different from Hotspot, you could as well rewrite it in C and check if it is failing or not. ChatGPT would generate testcase within a minute.

It all depends how good you are with x64 assembly. If you are good enough, you can easily deduce what the instructions at the location do, and can potentially simply copy-paste into an asm file, compile it and check result. Would be much faster to me.

Bluntly speaking, people who are not familiar with low-level debugging make an honest and succesful attempt to investigate a low-level issue. A seasoned kernel developer or reverse engineer would have just used gdb straight away.

MertsA · on Feb 20, 2024

>A seasoned kernel developer

I think you should take another look at the author list. Chris Mason counts as a seasoned kernel developer in my book. Either way I think you're missing the point. Yes gcj would be different, but there's a decent chance it could hand you a binary that reproduces the issue that you can bisect to the root cause from there. It's one thing to run it through gcj and see if it reproduces, rewriting it in C is a ton of work compared to gcj for something that might not pan out.

SomeoneFromCA · on Feb 20, 2024

I am not missing the point, as I do not believe in authorities and someone else's evaluations of skill level of yet another person. To rewrite a simple exponentiation in C would not cause "lots of work", and pinpointing the culprit, the exponentiation does not require any gdb debugging and disassembling. In fact, just knowing that exponentiation has caused that suggests faulty hardware and not further investigation required.

You should probably invite these people themselves to the discussion instead of speaking on their behalf. Not productive.

MertsA · on Feb 20, 2024

But if it's a consistent fault, like the silent data corruption covered in the linked paper, redoing the computation is still going to end up with no way to identify which core is faulty. If it's an intermittent fault, then even for hard realtime you can accomplish that with one core, just compute 3x and go with majority result.

vlovich123 · on Feb 20, 2024

Yup exactly. The only way independent hardware can help is if the fault is state dependent in a way on the hardware (eg differences in behavior due to thermal load or different internal state corruption or something) in which case repeated computations may not help if the repeated computation is not sufficiently decoupled temporally to get rid of that state. The other thing with independent hardware is that you don’t pay a 3x performance penalty (instead 3x cost penalty). That being said, none of these fault modes are what are really what is being discussed in the paper.

The other one that freaks me out is miscompilation by the compiler and JITs in the data path of an application. Like we’re using these machines to process hundreds of millions of transactions and trillions of dollars - how much are these silent mistakes costing us?

paganel · on Feb 20, 2024

I think that strictly looking at it in terms of money-related operations stuff can still be managed/double-checked externally, i.e. by the real world, which means that whatever mistakes/inconsistencies might show up there's still a "hard reality" out there that will start screaming "hey! this money figure is not correct!" because people tend to notice when there are big money-discrepancies and the "mistakes" are, generally speaking, reversible when it comes to money.

What's worrying is when systems like these get used in real-time life-and-death situations, and there's basically no reversibility because that would imply dead people returning to life. For example the code used for stuff like outer space exploration, sure that right now we can add lots and lots of redundancies and check-ups in the software being used in that domain because the money is there to be spent and we still don't have that many people out there in space. But what will happen when we'll think of hosting hundreds, even thousands of people inside a big orbital station? How will we be able to make sure that all the safety-related code for that very big structure (certainly much bigger than we have now in space) doesn't cause the whole thing to go kaboom based on an unknown-unknown software error?

And leaving aside scenarios that are not there yet, right now we've started using software more and more when it comes to warfare (for example for battle simulations based on which real-life decisions are taken), what will happen to the lives of soldiers whose conduct in war has been lead by faulty software?

vlovich123 · on Feb 20, 2024

The financial impact was just to highlight a scope in terms that can result in a single calculable easy to understand number. And also most transactions are automated and rarely validated manually so I’m not sure how many inconsistencies we’re catching. Look at the UK post office scandal and that was basic distributed systems bugs used in the auditing software where the system was granted privilege over manual review (sure there’s lots wrong with that scandal but it is illustrative of how much deference we give to automated systems since that tends to be the right tradeoff to make).

jorticka · on Feb 20, 2024

The recent Ukraine war shows that soldiers lifes are cheap - according to commanders.

So many soldiers on both sides died because of really dumb commander decisions, missing kit, political needs, that worrying about CPU errors is truly way way down the list.

paganel · on Feb 20, 2024

At the tactical level, of course that what you're saying is true, but the big Ukrainian counter-offensive from last year had been preceded by lots and lots of allusions made to "war games simulations" set-up by Ukraine's allies in the West (mostly the US and the UK), and it is my understanding that those war games were heavily taken into consideration as a basis for that counter-offensive decision. I'm not saying that the code behind those simulations was faulty, I'm just saying that software is already used at an operational level (at least) when it comes to war.

As per the sources, here's this one in The Economist [1] from September 2023, just as it had become obvious that the counter-offensive had fizzled out:

> American and British officials worked closely with Ukraine in the months before it launched its counter-offensive in June. They gave intelligence and advice, conducted detailed war games to simulate how different attacks might play out

And another one from earlier on [2], in July 2023, when things were stil undecided:

> Ukraine’s allies had spent months conducting wargames and simulations to predict how an assault might unfold.

[1] https://archive.is/1u7OK

[2] https://archive.is/NyGJI

jorticka · on Feb 20, 2024

It was mostly an old-school kind of wargames.

There was an article about a giant real 3D map in a room with all the commanders there discussing what each one has to do, contingencies, ...

The reason for the counter-offensive failure were multiple and complex, good or bad software would not have changed the result significantly.

jorticka · on Feb 20, 2024

If it's consistent and persistent, wouldn't that classify as broken hardware requiring device change?

Even with 3 chips, if one is permanently wrong you are then left with only 2 working ones so no redundancy is left for further degradation.

> just compute 3x

That might be difficult if CPU is broken. How are you sure you actually computed 3 times if you can't trust the logic.

MertsA · on Feb 20, 2024

>wouldn't that classify as broken hardware requiring device change?

Yes but you need to catch it first to know what to take out of production.

>That might be difficult if CPU is broken. How are you sure you actually computed 3 times if you can't trust the logic.

That's kind of my point. Either it's a heisen-bug and you never see those results again when you repeat the original program or it's permanently broken and you need to swap out the sketchy CPU. If you only care about the first case then you only need one core. If you care about the second case then you need 3 if you want to come up with an accurate result instead of just determining that one of them is faulty. It's like that old adage about clocks on ships. Either take one clock or take three, never two.

namibj · on Feb 20, 2024

You don't need to know which one of the two was bad; it's not worth the extra overhead to avoid scrapping two in the rare case you catch a persistent glitch; sudden hardware death (blown VRM or such, for example) will dominate either way, so you might as well build your "servers" to have two parts that check each other and force-reset when they don't agree. If it reboot-loops you take it out of the fleet.

MertsA · on Feb 20, 2024

Right, but the comment I was replying to was in response to this:

> 2 will tell you if they diverge, but you lose both if they do. 3 let's you retain 2 in operation if one does diverge.

If you care about resilience then you either need to settle with one and accept that you can't catch the class of errors that are persistent or go with three if you actually need resilience to those failures as well. If you don't need that kind of resilience like an aerospace application would need then you're probably better off with catching this at a higher layer in the overall distributed systems design. Rather than trying to make a resilient and perfectly accurate server, design your service to be resilient to hardware faults and stack checksums on checksums so you can catch errors (whether HW or software) where some invariant is violated. Meta also has a paper on their "Tectonic filesystem" where there's a checksum of every 4K chunk fragment, a checksum of the whole chunk, and a checksum of the erasure encoded block constructed out of the chunks. Once you add in yet another layer of replication above this then even when some machine is computing corrupt checksums or inconsistent checksums where the checksum and the data are corrupt then you can still catch it and you have a separate copy to avoid data loss.

MertsA · on Feb 20, 2024

Actually it's not uncommon for there to be ECC used within components as a method to guard against stuff like this. I don't think it's practical to ever have complete coverage without going full blown dual/triple redundant CPU but for stuff like SSD controllers they have ECC coverage internally on the data path.

MertsA · on Jan 24, 2024

Not this article, there's been plenty of reporting on this case as the seizure happened back in March 2021. He's not exaggerating, the FBI has "lost" the contents of some of the boxes when some owners fought back on the civil asset theft.

https://youtu.be/O436x2zbRwI?si=dB7GFWxpguEOObr8&t=235

MertsA · on Jan 17, 2024

It's reasonable for displaying with nothing more than knowing the email on haveibeenpwned.com but for everyone subscribed to notifications it would have been very helpful to include the source in the notification email and that would have avoided the biggest part of the privacy implications. Right now for a lot of people the latest breach notification email is unactionable because there's no way to figure out what account may have been breached. For me personally I received the notification but when I checked the actual list directly, not only was it immediately clear that it wasn't an account I care about, it was also a password that I've used but never with the account listed. Had the email from HIBP included just a tiny bit of additional information I wouldn't have needed to waste my time on it, especially when it seems that this breach has some unknown amount of bogus data in it.

MertsA · on Jan 3, 2024

Backblaze B2 is their generic object storage platform similar to S3. You pay what you use and it scales into petabytes. There's no minimums so if you're only using small amounts, you get a small bill at the end of the month. It's not backup software, just the underlying cloud storage platform.

HN For You