If management does not pay close attention, doing sloppy work with many small bugs is the key to very high (measured) productivity:
- You close a feature ticket by cutting all corners possible
- Every corner cut will later become a bug ticket, that you can solve quickly as you are more familiar with the feature
Total: you have closed more tickets than a dev that did the feature right the first time... sad
Regarding a dedicated person to handle the CI, here is my experience.
In the small company I am working now, I had setup a adhoc deployment script that was working fine, took less than 5min (with no user interaction) on my dev PC.
Since it is not a SaaS, our release cycle was a bit slower than wanted, 1 or 2 times per month, depending on circumstances.
A guy was hired and he wanted to speed this up. I explained clearly that the build/deploy script was not the cullprit of the "slow" release cycle. It takes time to decide was is ready for production, write a nice changelog for users (not just collecting git messages), testing on the custom hardware...
That is why it took me an hour to half a day when the boss said: we need a release today.
The above guy was justifying its work by: "After I am done, you will just need to put a tag and the rest will be automatic". I could not convince my boss this was fantasy land.
Result 3 years later: we have a "nice" CI which rebuild the world several times per day. But we are doing maximum 2 official releases per year, with much more stress. We had a few "releases" which needed very urgent hot-fixes because of last-minute changes and not enough testing on hardware.
And the CI person is constantly tweaking bits of the CI (it has become part of his job), breaking thing here and there.
> We had a few "releases" which needed very urgent hot-fixes because of last-minute changes and not enough testing on hardware.
That sounds like the problem is that the CI is not testing on hardware, or that you're running ad-hoc manual tests on hardware that could be automated or at least formalized.
I could see this being useful for emails that send out coupon codes. If a coupon code is updated for some reason, that update can be pushed to the user without having to send them another email.
The problem is cases of where you would want to update an email after it is sent is few and far between.
I somewhat agree, but every feature is not equal. And you usually start with the easy ones... So, even without any technical debt, it will probably go up, just less fast.
That's true. So I suppose I would amend to say "if the line is exponential". At the company I work for presently we try to adopt "painful now, pleasurable later".
In a previous job (Belgium), I had a non-compete clause, correctly written except that the "compensation" part was missing. To my understanding, when leaving, I could either:
- claim the clause null and work wherever I wanted
- follow the clause obligations and ask for half of my salary for the 3 years of non-compete
Moreover, when quitting, if they did not explicitly waive the clause, I could also ask for those 3 years of compensation... but they did not forget that when I left ;(
If you are 3 competitors on your market and you might lose x orders while your are down, but gain 2*x orders when the other 2 are down, you are ahead (with the same probability of going down as the others).
While it is probably true most of the time, like many things in live, "it depends".
When you are developing some not so well defined functions and everything is on quicksand, tests may just slow your progress, while not really providing useful feedback (i.e.: when you always modify a function and its test in tandem).
Ideally you should clarify your requirements, but sometimes the process of producing the code is the refining step
In your post:
"There's no more time-honored way to get things working again, from toasters to global-scale distributed systems, than turning them off and on again"
This is generally true, but as all rules, there are exceptions, and I encountered one a few month ago:
To be short, the system (an embedded soft real-time control) ran fine for a long time, and the user added more and more processes. After some glitch and "to be sure the restart fresh", he initiated a reboot... And then nothing worked anymore!
The problem: each process consumed a lot of RAM for a short period at their start. When the user added processes manually, everything ran smoothly. But as soon as a few processes needed to start roughly in sync, it took too much RAM, the OOM killer killed the entire app, and back to square one.
In a way, this is also an example of metastability: the application is restarting in a loop and cannot exit that loop on its own.