For those of us stuck on AWS it's sad not having BigQuery, but the thing that really gets me is not having Dataflow
Most of industry still seems unaware that no-knobs data query and pipeline systems even exist. If I only had a dollar for every time I saw a PR tweaking the memory settings of some Spark job or hive query that stopped running as the input data grew....
I'd love to see more people write their workflows using the Apache Beam API so they'll have the option to switch to a no-knobs, scalable pipeline engine in the future even if they're not using one today.
Hypothetically it should be possible to make an entirely new RDS cluster as a replica at a new version and fail over to it, with a similar error rate to a normal replica failover.
Setting up the infrastructure to manually manage your own cluster failover would kinda go against the spirit of using RDS and letting AWS manage infrastructure for you, though.
Yes, its an obvious gap in their offering. I assume they wrap pg_upgrade in their RDS upgrade process, but of course it is in-place and requires downtime. Their multi-AZ replication is a high-availability solution but the primary and secondary must both be the same version of PostgreSQL, useless for major upgrades. For our upgrade 9.6 -> 13.4, we had the added complication that we used PostGIS which added a kink to the upgrade path. A reliable, caveat-free, simple AWS-provided logical replication solution for zero DT cluster upgrade was sorely missed and the downtime on the recommended path was painful.
A challenge for the crypto world is that the values of being decentralized and trustworthy seem a bit at odds with the value of rapidly evolving.
If you compare the Amazon or Google of 1999 vs today, they're practically different businesses. Most well-funded, centralized businesses have a hard time evolving at all, let alone to that degree.
Seems like the challenges facing a decentralized organization wanting to evolve would be even larger.
I tried working at a company with non-technical founders and leadership. I'd never do it again.
It was like a poison when upper management didn't know how to hire competent engineering leaders. It seeps down into middle management and into the code and system architecture.
When you're an optimistic new hire you hope that things can change and you can be a part of it, but over time it becomes increasingly hard to imagine a future with a strong and wide enough cultural shift that the company could ever have the values that make working efficiently or building simple, scalable, reliable, adaptable software possible.
This seems like it could be handled by hiring ONE engineering leader for the engineering org - which is one spoke of the multi-spoke wheel that runs a modern day tech company.
The HN crowd is biased, but it seems to overinflate the importance of IC engineering contribution vs the contributions of the collective - eng and non eng.
But that was my whole point, that in practice, successful companies seem to be run by the engineer, and not by some business type and a good CTO of some sort. I even forgot Mark Zuckerberg in my original post.
So it seems that at the top level, you need to understand the engineering part to be successful.
My guess is that the leader needs to have a vision for the product, market, but also the development. And if the last part is lacking, you cannot make up for it in the lower levels.
Or maybe it's because an engineer better understands what can and can't be build, and so has a more realistic vision for the product to build. A non-engineer always needs to backtrack with an engineer how possible something is.
My wish for configuration languages is that as an industry we continue to adopt scriptable build systems like Bazel that make it easy to transform human-written configuration into machine-readable configuration during compile time.
Want comments in JSON? Spend 20 minutes refactoring a single BUILD rule and now humans can write JSON5 that's transformed into JSON
Want more flexibility? Have humans write cue or dhall or jsonnet
Wish you also had a copy of a subset of the same config data in YAML? Easy
Want to write a compile-time check in a programming language of your choice that a certain setting is never missing? Easy
Want all of this to work with reproducible builds across a variety of computers and distributed build farms? Easy
We don't have to let legacy build systems limit our imagination.
Agreed about having SRE fully integrated into feature teams in terms of day-to-day work, seating charts, team building events, etc.
From a management perspective I think it still makes sense for SRE to have a different reporting structure. Feature teams generally aren't rewarded for investing in long-term code and production health. If you report to a feature team manager there will often be downward pressure to focus less on production health and more on helping features ship quickly.
Giving SRE a degree of independent oversight serves as a system of checks and balances against those feature team pressures.
Horizontally scalable data storage is generally available these days (cockroach, tidb, vitess, etc.)
Rearchitecting from unscalable to scalable data storage is notoriously difficult and expensive. Even the most famously competent companies and teams have struggled with that transition and invested millions and millions of dollars on it. Building on unscalable data storage when scalable data storage is readily available feels like planning for failure and making a huge bet that your product or system will never have widespread use.
Scalable relational storage is relatively new to industry consciousness (outside of Google, where Spanner is used for nearly everything).
Spanner and Vitess existed at Google in the early 2010s. It was mid-to-late 2010s before CockroachDB, TiDB, and Vitess were available as open source.
Battle hardened database administrators who've been working with traditional unscalable databases for decades often aren't up-to-date with what's possible today unless they frequently talk shop with peers at other companies or go to conferences.
Personally I hope old databases like MySQL and PostgreSQL die out within the next decade. They served us well, but they're 25+ years old and the weight of legacy code plus the need for backwards compatibility is a huge drag on their ability to evolve.
I've experimented with migrating from PostgreSQL to CockroachDB and YugabyteDB. It makes sense that a newsql DB is slower but compared to to a local PostgreSQL cluster it was unacceptably slow. You lose features and are now dependent on a heavily VC funded startup that may not make it.
Performance reviews in corporate culture often have a "what have you done for me lately?" mindset.
If you're senior or staff and haven't launched anything exciting lately, middle management might become less interested in whether the service is running well and more interested in having "career" conversations about how your role description says you're supposed to be launching cross-functional projects more frequently.
Generally user data deletion happens in multiple phases for large companies that care about both compliance and user experience.
For example, if you delete an email or document on Google it moves to the "Trash" folder for 30 days.
When you manually empty the trash or the time window expires, most likely the next step would be a soft deletion for a few days where the data is still on hard drives but hidden from the application. Soft deletion is mainly protection against coding errors, since soft deletion is easy to undo if you've caused an incident but hard deletion (removing the data from disk) is not.
Then most likely a garbage collection process comes by a few days later and hard deletes the data from disk, leaving it only on tape backups
Finally, maybe a month or two later it disappears from the tape backups as they get rotated or otherwise disposed of
This addresses the needs of:
- Giving a good user experience (user "oops I made a mistake" undelete)
- Protecting against incidents due to coding errors (software engineer "oops I made a mistake" undelete)
- Making sure data disappears from both disk and backups within a certain time window, like maybe 30 or 60 days (comply with regulation and user expectations of data being cleared)
One use case for password managers is account sharing among multiple people. Password managers make more money if they support that use case, which provides legitimate value in companies and families
Most of industry still seems unaware that no-knobs data query and pipeline systems even exist. If I only had a dollar for every time I saw a PR tweaking the memory settings of some Spark job or hive query that stopped running as the input data grew....
I'd love to see more people write their workflows using the Apache Beam API so they'll have the option to switch to a no-knobs, scalable pipeline engine in the future even if they're not using one today.