I posted a comment on how 'sort | uniq -c | sort -n' is an interesting and very capable pipeline, but often misused and slower than other alternatives.
> you are comparing
Yes, I am comparing two methods of accomplishing the same thing. That is how comparing things works.
> Please, "huge waste"? How do you sort something that does not fit in memory?
Note how the full sentence included "if you give it 100GB of 5 different strings". If your input is 100GB of 5 different strings, then the hash table will easily fit in memory, and sorting the entire data set only to pass it to 'uniq -c' is indeed a 'huge waste'.
There are tons of large data sets that only have a small number of unique values in particular fields. protocols, ports, http status codes, hour of the day, etc. 'sort | uniq -c | sort -n' will work for all of them, but not nearly as efficient a hash table.
> That’s the theory but frankly the syntax is so cumbersome, irregular and needs so many googling for "easy" things like conditional, substring, etc. that I now use a real programming language if a script needs to be anything more than a list of commands without any logic (besides variables substitution).
You are basically describing modern programming.
Script Language (or scripting) is a programming language.
And about the "real" programming language you can also trap yourself googling and installing yet another library (did you read the code?) and/or reimplementing existing tools from the unix programming environment.
You are overly pedantic on a detail point that doesn’t matter: yes shell scripting is technically a programming language, but my point is that it is a terrible one worth ditching for any non trivial task. Perl was created precisely more than 3 decades ago to address this problem. Nowadays there are alternatives such as Python, Powershell or even scripting wrapper for compiled languages (such as C#) that allow to do the same job very well, with less surprising behavior and that can be refactored later more easily.
I disagree with you wholeheartedly and without condition.
Not only are you discounting how much time it takes to learn how to program effectively in a real 'glue' language you are high handed in ignoring the ubiquity, working archive and efficacy of a shell script. Not to mischaracterize but I find this type of attitude most frequently in 'lead' individuals with less than 10 years experience: typically 20's and 30's in age. I'm curious if this is your case?
Shell may be super useful to learn but that doesn't mean it's full of horrible footguns that greatly hamper your ability to determine if your code is doing the right thing or not once you get to a significant body of code: just take a look at https://mywiki.wooledge.org/BashPitfalls and compare with other languages that don't have some of the more egregious easy-to-make mistakes (e.g. anything to do with word splitting).
/bin/bash won't usually ship with a BSDish OS because of the license, so it is not generally portable to use bash-isms. (HPUX, IRIX, SunOS, Solaris, etc. I don't reckon would have had bash either)
> [...] but they're just scripts he has on his system. Personally I'd prefer to use the ubiquitous commands that work everywhere than rely on having custom scripts on my system [...]
Is okay for one to have their own tools.
$ f() { printf "\$%s" "$1"; }
$ echo a b c | awk '{ print $(f 2) }'
His system is not very different from mine or yours. He just chose to combine the tools in a specific way.
This is because in the first example you are invoking two programs. The first one sort the content of the file, the second count how many lines are equal.
While in the awk example it is creating a hash table with all words and incrementing by the key and then printing.
There is no sorting plus printing may be buffered.
He's comparing apples to oranges and reaching the conclusion that... yes, apples and oranges are different things. He's quite aware of this, and even points out the tradeoff -- `sort | uniq -c` still works if your dataset doesn't fit into RAM.
Another nice thing about /usr/bin/time is the --verbose flag which gives:
Command being timed: "ls"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1912
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 112
Voluntary context switches: 1
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
This is very likely because without the full path your shell is using the `time` builtin function of your shell as opposed to using the binary.
The shell's builtin keyword for `time` is more limited in nature than the full `time` binary. This is true of a number of other common unix commands as well, e.g. `echo`. The manpage for your shell should describe the builtins functions.
If the time reserved word precedes a pipeline, the elapsed as well
as user and system time consumed by its execution are reported when
the pipeline terminates.
man time:
Some shells may provide a builtin time command which is similar
or identical to this utility. Consult the builtin(1) manual page.
You are missing the point. (the "tired responses")
No, there only very few cases, such as shaving video encoding time (the one that started all this mess around "serverless")
For majority of other cases, using "(human) resource efficient" or "time saved managing servers" to justify the use of "serverless" is a plain bogus argument, since you end up shifting (best case scenario) OR spending more (worst case) time "managing the cloud's alphabet soup".
I will not even start talking about the code base mess.
Again you only spend more time if you don’t know what you’re doing.
I’m not a front end developer by any stretch of the imagination. Would it be a valid argument if I said that React isn’t a good solution because I have years of experienced with server side rendering?
Again, don’t blame the tools just because you didn’t take the time to learn how to use them efficiently.
If my name doesn’t give you a clue,I was around and developing way before AWS or hosted solutions was a thing. Being able to provision resources by writing yaml and not having to deal with the infrastructure gatekeepers is a godsend.
No, there only very few cases, such as shaving video encoding time (the one that started all this mess around "serverless")
There are millions of people using computers in countless different ways. Are you sure that you know all of the valid use cases?
> If my name doesn’t give you a clue,I was around and developing way before AWS or hosted solutions was a thing. Being able to provision resources by writing yaml and not having to deal with the infrastructure gatekeepers is a godsend.
Take some time to learn some soft skills, just enough to be able to convince people to do stuff for you, if you are the "throw over the wall ones" I recommend to get out of your sillo and interact well with the rest of your team :)
The problem with "serverless" is this kind of attitude around it.
My team and management (who is even older than I am so he’s not a young idealist by any means) are aggressively “killing as many pets” as possible and going all in on managed services and serviceless - including lambda, Fargate (Serverless Docker), CodeBuild (Serverless builds), AWS SFTP (getting rid of our sftp server).
Don’t get me started on the “cloud consultants” who were just a bunch of old school net ops folks who only knew how to click around on the web console and duplicate and on prem infrastructure.
Yes, working for small companies I’ve had to manage servers and networks back in the day in addition to development.
We would even go with “Aurora Serverless” for our non production environments if it had feature parity with regular Aurora.
I have a background in systems engineering building public/private cloud from hardware(cab and networking physical and logical provisioning) to VM provisioning automation and tooling. I've provided managed server, PHP hosting, Mail, and just about any other form of support you can imagine an ISP/Hosting provider needing to provide.
These days I deal more with full stack development and SRE type work. I ACTIVELY AVOID running servers, using tools like Chef and Ansible, and running kubernetes clusters unless there is a strong value proposition for them that can't be ignored. This despite the fact that I have no gatekeepers and ton of experience with all of them.
The poster above me does a better job of articulating the points I had, but at the end of the day, it seems to me like you are discounting an entire model of compute, part of an industry, the work that tons of developers are doing, not to mention new and interesting ways of developing new things, simply because you either don't want to learn something new or because you're attached to an existing way.
Note that I've never once said that running your own mail server is the wrong thing to do, because I don't claim to know everyone's complex environments. Instead, I'm pushing back against this habit of immediately discounting anything new simply because it replaces an older way of operating.
Fortunately the industry as a whole seems much more receptive to new things, otherwise we'd still be on mainframes everywhere, right?
> The poster above me does a better job of articulating the points I had, but at the end of the day, it seems to me like you are discounting an entire model of compute, part of an industry, the work that tons of developers are doing, not to mention new and interesting ways of developing new things, simply because you either don't want to learn something new or because you're attached to an existing way.
This mix of appeal to authority with the sunken cost fallacy makes no sense at all and does nothing to show any merit or present any technical case. Your argument boils down to launching personal attacks on anyone who has the gall to not jump onto the bandwagon you've jumped on. Yet, if you wish to make a case for a tool then you have to obligation to actually present a case and demonstrate the advantages.
Bullying those who don't blindly follow buzzword-driven development principles and resorting to name-calling just wastes everyone's time, and actually just creates noise that muffles the contributions of those who actually have something interesting to say regarding the tecnology.
> I’ve been working on a small side project that involves processing incoming email. In particular, it’s an app that needs to do something for each email it receives from (hopefully paying!) users.
I wish you all the best! Mind if I ask for the link?
> With (1), there is no need to worry about overages, but scaling the mail server might be challenging.
Honestly, quite the opposite.
1. Duplicate your MX box.
2. Duplicate your MX record.
That is it :)
> I am writing the mail processing daemon ... in Rust...
You might like to take a look in https://github.com/mailman/mailman for ideas/inspiration. It's a great tool for processing emails too, but cannot deny I'm now curious to see how one in rust will look like.
Yes, I only learned about MX record priorities last night haha. With Postfix, the most straightforward way to run code on receiving an email seems to be through a pipe filter. Running multiple filter processes probably requires a beefy server.
Thanks for that link! I might just use a similar approach to allow users to configure how to receive emails (HTTP or stdin, etc.).
Then limit the number of filters... you can have postfix run a fixed number of smtpd processes, and each process handles only one message at a time.
When they're all full, your server will just stop handling messages, but SMTP will retry anyway, giving you plenty of time to scale up if the load is consistently too high
[1]https://en.wikipedia.org/wiki/Wear_leveling