Was looking around GitHub for a nice wiki example and decided to check out projects from folk like The Guardian / New York Times / BBC, then stumbled across this page; I never appreciated that Android Browser had so little market share, given the number of Android OS devices out there. Other interesting stats in there too.
I met somebody the other day who works for a tiny start-up and said their MD frequently hires a couple of extra actors to make the office look busier when they have client meetings.
One of the unions has been quick to attribute the issue to the outsourcing to India of some of the IT responsibility, which the right wing press here has been all too eager to publish, but BA have rebuffed this and said at this stage they believe the root cause was a power supply issue - sure, that could be attributed somewhere along the lines to an 'alpha male business asshole' (as I read in one of the comments here), but it's probably best to wait and see what the post-mortem really is rather than seek to blame someone, somewhere, be it a businessman, an Indian dev team or anything else.
I am reminded of a post a while back regarding AWS' issues affecting multiple data centres (I forget the specifics), and how their post mortem didn't appropriate blame on anyone (which it really easily could have), but rather their own checks and balances, which allowed the issue to arise in the first place. I do hope that when the dust settles we see a measured response rather than a witch hunt.
I've found that not only is it good to not assign blame in postmortems, but it's also accurate: The culprit usually is the checks and balances, as mistakes will happen, and the goal should be to have failsafes and detection.
I'm reminded of airplane accidents: Whenever you hear of an airplane accident, it's always some amazingly crazy series of things going exactly wrong to get the plane to crash. We have a tendency to think "wow, what bad luck", but a better way to think about it is that airplanes are so safe that an accident' can't occur unless a whole series of things go very specifically wrong.
A company's goal should be to increase the number of necessary things that need to all go wrong before there is downtime.
One other important point is that the very term "root cause" is extremely harmful in that it presumes a primary failure and already seeds the idea of one bad actor and, by proxy, blame. Systems today are too complex to blame upon one or two things - we operate in a very complicated, "complected" world both in our software and in many organizations.
While there are always technical causes for larger technical failures, I've seen far too many times RCA post-mortems performed that result in witch hunts instead of a solemn contemplation of how things could be better done by everyone. Such an RCA may ignore that a normally careful engineer was overworked by managers, never is lack of relevant monitoring and testing due to budget cuts cited, and you'll certainly never see "teams X and Y collaborated too much" as a reason for failure in these places. Because in a typical workplace, the company's values and culture are never related to a failure. You can't objectively measure how bad or how good a culture is either. Why make it part of post mortems when you don't think it's a failure?
I don't recall ever having a manager use the term root cause analysis in the way you are implying.
Usually we are looking for the cheapest or most effective process change that will prevent that class of problem happening again.
> but a better way to think about it is that airplanes are so safe that an accident' can't occur unless a whole series of things go very specifically wrong.
As an aside, I met someone who was working on a graph theory problem as their research project, and the application was that you could model the entire process of aircraft control through a state machine using that graph. Effectively they are working on making it mathematically impossible for a crash to occur assuming that a certain process is followed (with safety measures ofc).
The challenge is to avoid pushing all the risk into that assumption. It's easy enough to build a system that never breaks if you're willing to assume perfect behaviour on the part of its dependencies, environment, users and operators.
If you've seen how aircraft controllers and pilots work, I think that "following the rules" is a very fair assumption to make. But ignoring that, obviously if implemented there would be fail-safes.
> It's easy enough to build a system that never breaks if you're willing to assume perfect behaviour
It isn't though. Seriously, think about how you could safely route several thousand flying hunks of metal through fairly small air corridors (which all have inertia) and you need to maintain strict flight schedules. Then think about how you need to factor in all of the edge cases caused by emergencies on planes (these are all included in the process for flight controllers). Then think how you could mathematically prove that safety.
Yes, it's easier if you assume that people will follow a certain process (and actual flight systems have so many layers of fail-safes that it's ridiculous) but it's definitely not "easy enough".
If an error makes it into production it is always process, never an individual, even if the individual involved was malicious. The only thing you can do with errors is fix them, learn from them, and then fix the process too.
Yeah. I'm sure the "power supply issue" is an overly simplistic explanation -- it's highly unlikely a single computer PS could cause such major issues in such a huge organization.
That said, having almost entirely dodged any outsourcing-related issues in the 90s, and worked with generally great offshore teams, seeing my current role impacted by an utterly shortsighted and ignorant attempt to offshore critical operations tasks is quite disheartening. It's rarely the fault of the teams, it's the fault of higher-ups who completely fail to grasp the complexity and consequences of the tasks they're offloading. Everything looks great for a few weeks or months or years until one of the dozens of things that have gone neglected rear their ugly heads. If they're lucky, they take the money and run before the crash happens and escape most blame.
Hey there! "alpha male business asshole" theorist checking back in on this. While not a slam-dunk yet, it's looking like the theory of alpha-male business asshole looking to advance career at expense of company and deflection of blame onto operators of business is gaining some credence!
We dropped npm in favour of yarn a while back, but I imagine we'll switch back once the stable release is out. Our needs are perhaps simpler than most, but the key issues we were having with npm at the time were directly solved by yarn:
1. Dependency install was taking too long.
2. Inconsistent builds between devs because of no lock file.
3. Could not search the registry from the command line in a timely manner.
Having just done a quick test on a random project, I'm pleased to see that all of those concerns are now taken care of, plus the install (for this project at least) is ~30% faster than it is with yarn.
Interesting comment regarding Transit - as a Londoner who went to NYC for the first time last year I didn't even contemplate looking up another app; I just opened up CityMapper and it said "It looks like you're in New York. Switch City?", so it's pretty seamless.
A few years ago I'd definitely have trawled the App Store and I'm sure I'd have found Transit, but it just goes to show the power of a good, reliable interface and considered rollout to ensure you're always the go-to product if someone is aware of you.
This is my favourite thing about Citymapper. I use it everyday (multiple times) in London. Went to Milan for the weekend and was able to continue using the same app. It made figuring out another transit system incredibly simple. It's becoming like Uber for me in the sense that when I go somewhere Uber doesn't support getting around becomes something that takes quite a lot of figuring out.
> If someone one hundred years ago would have said that we would be transporting as many passengers in aircraft as we would in trains, people may have thought , "a steam engine would never fit in an aircraft made of wood and ropes".
That's such an eloquent way of highlighting the need for conceptualisation in engineering but also the struggle to convince others (investors perhaps) that you're not a complete nutcase when you think outside the box.
If you're struggling to convince others it's best to leave the Galileo fallacy out of your arguments.
Just because something was thought to be ridiculous in the past and now we know better, that doesn't mean that your seemingly ridiculous idea is just as misunderstood and underestimated as that other idea was at the time.
Engineers don't necessarily think in terms of today's equipment when writing out models that describe the problem. They think in terms of abstract concepts, like "power source" and "lift generator".
We're already trying to build a space elevator when we don't even know what material can possibly withstand those forces. It's something over the horizon, but we're hopeful we'll find it. Likewise, powered flight was possible, the only problem was finding lighter, stronger materials for the airframe and a lighter, more powerful engine to fly it.
The thing with thinking outside the box is that you never really know if you're doing it right unless you hit the "jackpot". I.e. an idea seems obvious to you, but not to your co-workers/friends/bosses/whatever and is usually met with serious resistance from them. Especially in a work setting where you have to convince "non-technical" people. Sometimes it takes more than nice words to get your message across.
But that's all with the benefit of hindsight and survivorship. Most outside the box ideas are nutcase ideas that fly in the face of basic physics and logic.
I was so fascinated with the survey results being statistically significant (and therefore thinking the more respondents the better) that I failed to realise that 5 people willing to take the time to respond, and 4 willing to share their emails is such a good starting point to get the ball rolling.
I like the idea; more and more I'm finding myself hosting static sites (maybe with a CMS back-end hosted elsewhere that generates and publishes to S3) and this is a good solution.
However... the website simply doesn't work. I get the feeling from it I should be able to simply enter my email address and copy the generated code, yet when I do so nothing happens, and unless my access token really is `abc123`, something's not right here!
Did you follow the link sent to your email address? That may be the missing piece. The front page has an example (maybe it shouldn't be actual text, it makes it seem like you can copy it).
Once you use the link that's sent to your email address, you should be able to copy and paste the examples from your personal log-in page.
EDIT: I see why this is confusing, since the textbox doesn't prompt you to submit. I'm adding submit buttons, they'll be up in a few minutes.