Might be true, but I don't see any aspect of that which is relevant to this event:
* Data single obviously means losing a single drive will cause data loss, but no drive was actually lost, right?
* Metadata DUP (not sure if it's across 2 disks or all 3) should be expected to be robust, I'd expect?
* I certainly eye DM-SMR disks with suspicion in general, but it doesn't sound like they were responsible for the damage: "Both DUP copies of several metadata blocks were written with inconsistent parent and child generations."
> Metadata DUP (not sure if it's across 2 disks or all 3) should be expected to be robust, I'd expect?
No. DUP will happily put both copies on the same disk. You would need to use RAID1 (or RAID1c3 for a copy on all disks) if you wanted a guarantee of the metadata being on multiple disks.
The DUP profile is meant for use with a single disk. The RAID* profiles are meant for use with multiple disks. Both are necessary to cover the full gamut of BTRFS use cases, but it would probably be good if mkfs.btrfs spat out a big warning if you use DUP on a multi-disk filesystem, as this is /usually/ a mistake.
ZFS has similar configurations possible (e.g. copies).
You can end up in this state with btrfs if you start with a single device (defaults to data=single,metadata=dup), and then add additional devices without changing the data/metadata profiles. Or you can choose this config explicitly.
I really wish the btrfs-progs had a --this-config-is-bad-but-continue-anyway flag since there are so many bad configurations possible (raid5/raid6, raid0/single/dup). The rescue tools are also bad and are about as likely to make the problem worse as fix it.
I doubt they explicitly said "I'll run without huge pages, which is an important AWS configuration". They probably just forgot a step. And "someone at Amazon" describes a lot of people; multiply your mental probability tables accordingly.
The number of people at Amazon is pretty much irrelevant; the org is going to ensure that someone is keeping an eye on kernel performance, but also that the work isn’t duplicative.
Surely they would be testing the configuration(s) that they use in production? They’re not running RDS without hugepages turned on, right?
> The number of people at Amazon is pretty much irrelevant; the org is going to ensure that someone is keeping an eye on kernel performance, but also that the work isn’t duplicative.
I'd guess they have dozens of people across say a Linux kernel team, a Graviton hardware integration team, an EC2 team, and a Amazon RDS for PostgreSQL team who might at one point or another run a benchmark like this. They probably coordinate to an extent, but not so much that only one person would ever run this test. So yes it is duplicative. And they're likely intending to test the configurations they use in production, yes, but people just make mistakes.
True; to err is human. But it is weird that they didn’t just fire up a standard RDS instance of one or more sizes and test those. After all, it’s already automated; two clicks on the website gets you a standard configuration and a couple more get you a 96c graviton cpu. I just wonder how the mistake happened.
No… I’m assuming that they didn’t use the same automation that creates RDS clusters for actual customers. No doubt that automation configures the EC2 nodes sanely, with hugepages turned on. Leaving them turned off in this benchmark could have been accidental, but some accident of that kind was bound to happen as soon as the tests use any kind of setup that is different from what customers actually get.
You're again assuming that having huge pages turned on always brings the net benefit, which it doesn't. I have at least one example where it didn't bring any observable benefit while at the same time it incurred extra code complexity, server administration overhead, and necessitated extra documentation.
It is a system-wide toggle in a sense that it requires you to first enable huge-pages, and then set them up, even if you just want to use explicit huge pages from within your code only (madvise, mmap). I wasn't talking about the THP.
When you deploy software all around the globe and not only on your servers that you fully control this becomes problematic. Even in the latter case it is frowned upon by admins/teams if you can't prove the benefit.
Yes, there are workloads where huge-pages do not bring any measurable benefit, I don't understand why would that be questionable? Even if they don't bring the runtime performance down, which they could, extra work and complexity they incur is in a sense not optimal when compared to the baseline of not using huge-pages.
> While using huge pages whenever possible is the right solution and this should be enough for PostgreSQL, perhaps there are applications that cannot use huge pages and which are affected by the regression.
It will be more interesting to talk about those applications if and when they are found. And I wouldn't assume the solutions are limited to reverting this change, starting to use the new spinlock time-slice extension mechanism, and enabling huge pages.
It sounds like using 4K pages with 100G of buffer cache was just the thing that made this spinlock's critical section become longer than PostgreSQL's developers had seen before. So when trying to apply the solution to some hypothetical other software that is suddenly benchmarking poorly, I'd generalize from "enable huge pages" to "look for other differences between your benchmark configuration and what the software's authors tested on".
By "those applications" I'm talking about other applications affected by this regression. There are several apps in addition to Redis that recommend limiting the transparent huge page configuration. (Some of them recommend using explicit huge pages instead.) But it's quite possible none of them are affected by this regression, as it may be particular to apps using spinlocks. (Certainly the new rseq API mentioned in the thread is targeted at spinlock users.) It seems equally possible to me that some spinlock-using app has a regression irrespective of huge pages.
Or semi-fiction? The author is actually blind and tagged it nonfiction, but I suspect some embellishment.
> but in reality, Karen is likely just as annoyed by this as the author.
When I'm frustrated talking with an agent of a big organization, I try to remember they probably didn't set the policy. But I also expect them to express some empathy for how I'm negatively affected by that policy. The author/protagonist, accurately or not, felt the opposite from "Karen from compliance". In their shoes, I wouldn't feel much empathy for Karen in return.
> The spam should go to the person in charge
I also expect the agent to have a closer relationship with "the person in charge" than I do (none whatsoever). If I mention the policy is absurd, they could at least make some effort to pass that along to their manager.
Also, sending the information to the agent is necessary compliance, even if the volume is malicious.
> not the person who is forced to deal with this every day
Maybe they feeling a bit of the pain themselves might make them more likely to speak up. If this becomes a miserable job that no one will stay in, that might provoke a change.
> Maybe they feeling a bit of the pain themselves might make them more likely to speak up. If this becomes a miserable job that no one will stay in, that might provoke a change.
Unfortunately, it might also just cause anyone who wants to do good to leave, leaving people who just need a job and don't care about doing good.
> Unfortunately, it might also just cause anyone who wants to do good to leave, leaving people who just need a job and don't care about doing good.
I don't think the author would have acted this way toward someone who said "sorry, I know it's a burden, I know it's stressful to be at risk of losing these benefits, and I've told that to everyone I can repeatedly." So how much danger is there really that the inconvenience of reloading the fax machine is pushing out someone who is trying to do good?
(For the sake of argument, I'm going with all the details of the story, including that this caused Karen any distress at all. I think it's more likely a real office like this has a setup for which getting a 500-page fax is no big deal at all. And if it really is a DoS on their processing, the consequence I'd be more worried about is causing acceptance to slow down enough that other disability claims are not processed before their deadline.)
> I don't think the author would have acted this way toward someone who said "sorry, I know it's a burden, I know it's stressful to be at risk of losing these benefits, and I've told that to everyone I can repeatedly." So how much danger is there really that the inconvenience of reloading the fax machine is pushing out someone who is trying to do good?
It's not just the faxing that causes people to act the way Karen (supposedly) acted - it's the anger and maliciousness being directed at them by numerous people, all day, every day, even when they do try to be sympathetic to the fact that the system fucks everyone. But there's only so much empathy one can muster.
(Not to mention the various other factors that push good people out of government, such as working for decades to make the systems better only for them to get worse.)
To be clear, I agree with you to an extent; if instead of being malicious and directing anger at the people doing their best to help, people like the author more calmly expressed their frustration with the system, maybe they can bring it up with their superiors, as you said.
All of it's a mess, and not a single facet of this issue is without blame - not the recipients, not the bureaucrats, not the politicians, and certainly not the voters.
I hear you...to an extent. I just got off the phone with Comcast Business Class, asking for a refund after I had 26 hours of downtime in the past week. Not a company with a great reputation for customer service, and the agent I spoke with was probably not exactly earning a six-figure salary. He was empathetic. The outcome was unsatisfactory [1], but he was polite, he said he understood how important availability is my business, he put me on hold for a while, said he tried for more with his manager, and I believed him. That's all it takes, not like a master study in empathizing with your bitter enemy and de-escalating conflict. I'm mad at Comcast, but I'm not mad at him.
[1] A discount that was less than the delta between consumer-class and business-class prices, when the latter doesn't seem to actually be providing better availability lately.
Yes, some people thrive on talking to a lot of people. For everyone else, it can be exhausting. It's hard to navigate social differences talking to 15+ strangers every hour for 8 hours. Each person has a unique expectation about how to relate to them. It's hard knowing, for example, who wants to be interrupted and who doesn't [0]. Some people talk in vagueries with exposition, making it hard to understand what it is they want, but feel they have communicated clearly, so get upset at being asked questions. I could go on and on about this. The end result is an absolutely JUICED frontal lobe, though. "Why don't you find another job" is a common question to people and I don't think people with a juiced frontal lobe have the capability to reason their way into getting their resume and applying to new jobs. To remember that comment would be to remember 25 calls ago that someone told you to find a new job.
> He was empathetic.
I don't understand what this means when people say it. Empathetic means having empathy for someone, which means imagining being in their situation, and feeling the feeling associated with that situation. That takes a long time for me, like a few minutes, uninterrupted, at least. So either I would have to lie and say "wow, that must be so frustrating", which is not empathy, that's just saying words that sound like empathy. And that brings me next to the next thing I don't understand... either that person was also lying or somehow people have the ability to just contemporaneously download the feelings of other people, feel them, but also not act like they're feeling them (because how are you supposed to feel frustrated without being frustrated?) so as not to make the customer upset.
Customers hate to hear (in a sort of "stop being upset that's annoying" way) sadness or anxiety or the braced statements of a person (often perceived as rude) used to having to repeat, for the 50th time, something people don't want to hear. I do have the empathy to recognize this when a customer service agent does it and cut them the slack because probably had to spend all their empathy on someone else.
Then I read about things like surface acting vs deep acting and see that the surface acting part is bad for your emotional health but that deep acting takes a lot of extra energy [1]!
Finally I ask the question of am I evolved to even be able to socially interact with 120 strangers in a given day?
"that's all it takes" might be underselling the dynamic here.
>Yes, some people thrive on talking to a lot of people. For everyone else, it can be exhausting. It's hard to navigate social differences talking to 15+ strangers every hour for 8 hours a day.
Okay. It's a job. I know choices are slim, but "its hard for my mental state" has never been a satisfactory excuse to further displease customers.
>So either I would have to lie and say "wow, that must be so frustrating", which is not empathy,
Sometimes a little white lie is easier than a cold hard truth. Just ask any salesman.
>And that brings me next to the next thing I don't understand... either that person was also lying or somehow people have the ability to just contemporaneously download the feelings of other people, feel them, but also not act like they're feeling them
Given the author is blind, I imagine he's better than average at reading the tone of voice. He could have interpreted it wrong, but I'm sure this dismissive tone isn't new to him.
>Finally I ask the question of am I evolved to even be able to socially interact with 120 strangers in a given day?
Probably not. But I'm not sure what you want me to say. I don't want to be the same as Karen and say "suck it up, it's a job. But this is such a commin feeling on modern society. If we aren't going to collectively rise against its, we're bearing the flood alone.
Given how we're still actively drowning people, I don't see us coming together soon.
This is missing the forest for the trees. You are ignoring the wider corpus of the individual's experiences in favor of a single negative interaction, and then using that single interaction, isolated from all their other experiences, to judge the entirety of their character.
> Okay. It's a job. I know choices are slim, but "its hard for my mental state" has never been a satisfactory excuse to further displease customers.
The chemical reality of the the frontal lobe getting exhausted is not an "excuse". It still misses the forest for the trees: if your frontal lobe (the part of the brain responsible for social understanding, reasoning, executive function, and information recall [0]) is taxed, you are way less likely to even understand that you're displeasing the customer! The ultimate irony here is the tool needed to understand how to not do that thing anymore is also the frontal lobe.
> Sometimes a little white lie is easier than a cold hard truth. Just ask any salesman.
That's a nice way to soften it, but pretending to empathize with someone who you're not actually empathizing with sounds psychopathic. I don't want to model my behavior nor do I want anyone else to model their behavior after an industry that is known for dark triad personalities [1]. A lie is still a lie and lying about something so intimate as feeling their experiences doesn't sit right with me at all. You should read the link I posted in my earlier comment which discusses surface acting and how it is very taxing on the individual.
> Given the author is blind, I imagine he's better than average at reading the tone of voice. He could have interpreted it wrong, but I'm sure this dismissive tone isn't new to him.
Reading a stranger's tone is a guess and negativity bias affects our perception of a stranger's intent [2]. The sum of their total negative experiences absolutely can make them interpret someone else's tone as having "dismissive" intent even though it's just as likely to be what I already described: braced speech in anticipation for a person responding to something they don't want to hear.
And there you can see negativity bias on both sides! The difference is that the representative gets no post-call time to consider what happened before they have to take the next call and they have the issue of not really having the foresight to actively introspect and keep a strong sense of understanding the situation the customer is going through. (As a reminder, both foresight and introspection require some level of functioning frontal lobe, which is already getting juiced for the next social interaction that's about to happen).
> Probably not. But I'm not sure what you want me to say. I don't want to be the same as Karen and say "suck it up, it's a job. But this is such a commin feeling on modern society. If we aren't going to collectively rise against its, we're bearing the flood alone.
I'm not sure what you mean, you effectively said "suck it up, it's a job" at the beginning of your comment when you said "Okay. It's a job". Of course no one wants to be the same as Karen, Karen doesn't want to be the same as Karen, but as I've already explained, is incapable of extricating herself from the dysfunction! Her frontal lobe is shot!
But the author? He does have that capability after the interaction. He is an author, with time to introspect. He chose to be an ass hole instead. Of course, his growth over the years has been stunted by the way he has been treated. I am not in the business of dredging up someone's life experiences and putting them on display, but he has painful experiences beyond being blind in a society not built for blind people.
But I have the privilege of being able to see all that and take it into consideration. Karen does not. She doesn't have the hint about his upbringing that I do. She probably doesn't have the time or mental capacity to introspect, and consider, if what she's doing makes people feel bad.
I can fault neither of these people for being ass holes, because that would amount to faulting them for their upbringing, faulting them for the situation they're in.
> But I'm not sure what you want me to say.
I don't want you to say anything, I want you to think about what empathy really means beyond the surface level. That this isn't a situation where anyone should be trying to say "who has experienced the most hardship" so we can pick who wins empathy and who gets labelled an ass hole for perpetuity.
I want people to stop doing the thing where they only empathize with the person most like them and instead try to feel what it's like to be like the person who is least like them. Sometimes that's not intuitive. Just because the dude is blind doesn't mean he isn't more like you than the person who isn't.
>I want you to think about what empathy really means beyond the surface level.
Empathy is caring for your fellow person and internalizing that to advance causes that benefit us all.
But empathy can fall into the paradox of tolerance as well. You can't empathize with the orange man who wants to see you out of the country and your kid on an island. Anyone who is a drag on the cause can't be carried, because their mindset is to drag you backwards, if not outright eliminate you.
Those are the two aspects I balance in my mind. I try to give basic respect to anyone I meet and run into, but there are some individuals you need to cut out if you want to achieve your goals.
>I want people to stop doing the thing where they only empathize with the person most like them and instead try to feel what it's like to be like the person who is least like them.
Sure. Already doing it. If anything I probably relate a lot more to Karen. miserable office job I hate making not enough money and stuck in a horrible system with little advancement, and increasingly little control over my life. Outside of "we like tech" and "we met annoying people", I probably don't relate much with the author.
The only difference between me and karen is that I've learned to hold my tongue and not take my frustrations out on others. It kind of helps when the clients are children; there's no optics win for yelling at a kid for me. The kids will simply double down or break down, boss will reprimand me, the parents will reprimand me. If that's my consequences for these actions, what's there to empathize here with Karen?
I'm not mad "at" Karen. I'm mad at the wider system that creates Karens as they are put in, chewed, and spit out. I do want better for all of us, but I also don't have the professional capacity to help people like Karen along the way. Odds are they will also actively drag the cause for all of us down. I can't save everyone.
>I don't think the author would have acted this way toward someone who said "sorry, I know it's a burden, I know it's stressful to be at risk of losing these benefits, and I've told that to everyone I can repeatedly."
Have you seen how much public sector employees taking calls get paid to be abused all day?
If you want people with limitless wells of compassion, pay better. Public sector jobs generally get to scrape the bottom of the barrel and compete with the local grocery store.
> Google had their own versions of things. IIRC bugs had both a priority and s everity for some reason (they were the same 99% of the time) between 0 and 4. So a standard bug was p2/s2. p0/s0 was the most severe and meant a serious user-facing outage. People would often change a p2/s2 to p3/s3, which basically meant "I'm never going to do this and I will never look at it again".
Yeah, I've done that. I find it much more honest than automatically closing it as stale or asking the reporter to repeatedly verify it even if I'm not going to work on it. The record still exists that the bug is there. Maybe some day the world will change and I'll have time to work on it.
I'm sure the leadership who set SLAs on medium-priority bugs anticipated a lot of bugs would become low-priority. They forced triage; that's the point.
> People even wrote automated rules to see if their bugs filed got downgraded to alert them.
This part though is a sign people are using the "don't notify" box inappropriately, denying reporters/watchers the opportunity to speak up if they disagree about the downgrade.
> Many consumer SSDs, especially DRAMless ones (e.g., Apacer AS350 1TB, but also seen on Crucial SSDs), under synchronous writes, will regularly produce latency spikes of 10 seconds or more, due to the way they need to manage their cells.
Is there an experiment you'd recommend to reliably show this behavior on such a SSD (or ideally to become confident a given SSD is unaffected)? Is it as simple as writing flat-out for say, 10 minutes, with O_DIRECT so you can easily measure latency of individual writes? do you need a certain level of concurrency? or a mixed read/write load? etc? repeated writes to a small region vs writes to a large region (or maybe given remapping that doesn't matter)? Is this like a one-liner with `fio`? does it depend on longer-term state such as how much of the SSD's capacity has been written and not TRIMed?
Also, what could one do in advance to know if they're about to purchase such an SSD? You mentioned one affected model. You mentioned DRAMless too, but do consumer SSD spec sheets generally say how much DRAM (if any) the devices have? maybe some known unaffected consumer models? it'd be a shame to jump to enterprise prices to avoid this if that's not necessary.
I have a few consumer SSDs around that I've never really pushed; it'd be interesting to see if they have this behavior.
> Also, what could one do in advance to know if they're about to purchase such an SSD? You mentioned one affected model.
Typically QLC is significantly worse at this than TLC, since the "real" write speed is very low. In my experience any QLC is very susceptible to long pauses in write heavy scenarios.
It does depend on controller though. As an example, check out the sustained write benchmark graph here[1], you can see that a number of models starts this oscillating pattern after exhausting the pseudo-SLC buffer, indicating the controller is taking a time-out to rearrange things in the background. Others do it too but more irregularly.
> You mentioned DRAMless too, but do consumer SSD spec sheets generally say how much DRAM (if any) the devices have?
I rely on TechPowerUp, as an example compare the Samsung 970 Evo[2] to 990 Evo[3] under DRAM cache section.
Results from Apacer AS350 1TB: https://pastebin.com/F6pr5g29 - the first field is the timestamp in milliseconds, the second one is the write IOs completed since the previous line.
EDIT: I was told that the test above is invalid and that I should add --direct=1. OK, here is the new log, showing the same: https://pastebin.com/Wyw6r9TC - note that some timestamps are completely missing, indicating that the SSD performed zero IOs in that second.
You may want to repeat the experiment a few times.
> This was an oversimplification bordering on being misleading. It’s a lighter JS runtime that’s calling native code for rendering controls. The argument still has merit. Just because something in JS doesn’t make it slow or bloated. Interpreted languages will almost always be slower than their native compiled counterparts, but it’s negligble [sic] for these purposes.
Isn't it a full JS runtime? I think by "a lighter JS runtime that's calling native code" you mean it doesn't deal with HTML/CSS rendering, but that's not what JS runtime means. These are separate parts of the browser architecture.
I don't agree it's negligible for this purpose. Core OS functionality should run well on old/cheap machines, and throwing in unnecessary interpreters/JITs for trivial stuff is inconsistent with their recently announced commitment to "faster and more responsive Windows experiences" and "improved memory efficiency".
Here's one: Microsoft management heavily incentivizes their developers to use LLMs for virtually everything (to the "do it or you're fired" level) and the LLM (due to its training data or whatever) is far more able to pump out code with React Native than their own frameworks. This makes it the right choice for them. Not for the user, but you can't have everything.
I don't have any inside information; I'm running with the hypothetical.
> Shouldn't devs be allowed to select what they feel is the "best" choice for a given component?
To some extent, yes. But if they choose React Native, something's probably wrong, because (despite what the article says) that requires throwing in a Javascript engine, significantly bloating a core Windows component. If they only use it for a small section ("that can be disabled", or in other words is on by default), it seems like an even poorer trade-off, as most users suffer the pain but the devs are making minimal advantage of whatever benefits it provides.
If the developers are correct that this is the best choice, that reflects poorly on the quality of Microsoft's core native development platforms, as madeofpalk said.
If the developers of a core Windows component are incorrect about the best choice, that reflects poorly on this team, and I might be inclined to say no, someone more senior should be making the choice.
Probably never had to work with (live) video at all? I think using moq is the dream for anyone who does. The alternatives—DASH, HLS, MSE, WebRTC, SRT, etc.— are all ridiculously fussy and limiting in one way or another, where QUIC/WebTransport and WebCodecs just give you the primitives you want to use as you choose, and moq appears focused on using them in a reasonable, CDN-friendly way.
* Data single obviously means losing a single drive will cause data loss, but no drive was actually lost, right?
* Metadata DUP (not sure if it's across 2 disks or all 3) should be expected to be robust, I'd expect?
* I certainly eye DM-SMR disks with suspicion in general, but it doesn't sound like they were responsible for the damage: "Both DUP copies of several metadata blocks were written with inconsistent parent and child generations."
reply