Oh, yeah! And I believe there are not many games which have Linux, Windows and MacOS versions allowing interplay. Several years ago we had one or two LAN parties with hardware running all three operating systems.
I tried this variant of JetBrains Mono and it had the perfect glyph width (reportedly -6%) for my screen and window sizes: NRK Mono Condensed from https://github.com/N-R-K/NRK-Mono. I also agree with almost all of the other modifications mentioned in the github page under “Some notable changes are:”
Now I can have side-by-side two editors plus a Structure or Project pane at the left in PyCharm while having 120 chars visible in both editors.
Well, we can use memoryview for the dict generation avoiding creation of string objects until the time for the output:
import re, operator
def count_words(filename):
with open(filename, 'rb') as fp:
data= memoryview(fp.read())
word_counts= {}
for match in re.finditer(br'\S+', data):
word= data[match.start(): match.end()]
try:
word_counts[word]+= 1
except KeyError:
word_counts[word]= 1
word_counts= sorted(word_counts.items(), key=operator.itemgetter(1), reverse=True)
for word, count in word_counts:
print(word.tobytes().decode(), count)
There's bound to be a way to turn a stream of bytes into a stream of unicode code points (at least I think that's what python is doing for strings). Though I'm explicitly not volunteering to write the code for it.
Oh that's neat, though I might split this into two functions in most cases, no need to entangle opening the file and counting the words in a filelike object.
That's two neat tricks that I'm definitely adding to my bag of python trickery.
Sure, but making one string from the file contents is surely much better than having a separate string per word in the original data.
... Ah, but I suppose the existing code hasn't avoided that anyway. (It's also creating regex match objects, but those get disposed each time through the loop.) I don't know that there's really a way around that. Given the file is barely a KB, I rather doubt that the illustrated techniques are going to move the needle.
In fact, it looks as though the entire data structure (whether a dict, Counter etc.) should a relatively small part of the total reported memory usage. The rest seems to be internal Python stuff.
I dislike loading files into memory entirely, in fact I consider avoiding that one of the few interesting problems here (the other problem being the issue of counting words in a stream of bytes, without converting the whole thing to a string).
If you don't care about efficiency you can just do len(set(text.split())), but that's barely worth making a function for.
That's pretty much the reason why. Raymond Hettinger explains the philosophy well while discussing the `random` standard library module: https://www.youtube.com/watch?v=Uwuv05aZ6ug
I feel like much of this has been forgotten of late, though. From what I've seen, i's really quite hard to get anything added to the standard library unless you're a core dev who's sufficiently well liked among other core devs, in which case you can pretty much just do it. Everyone else will (understandably) be put through a PhD thesis defense, then asked to try the idea out as a PyPI package first (and somehow also popularize the package), and then if it somehow catches on that way, get declined anyway because it's easy for everyone to just get it from PyPI (see e.g. Requests).
I personally was directed to PyPI once when I was proposing new methods for the builtin `str`. Where the entire point was not to have to import or instantiate anything.
For the most simple case of a single job, I use the job number (`[1]` in the example) with %-notation for the background jobs in kill (which is typically a shell builtin):
On scripts that might handle filenames with spaces, I include:
IFS=' ''
'
Hint: the spaces between the first two apostrophes are actually one <Tab>.
This does not affect the already written script (you don't need to press Tab instead of space to separate commands and arguments in the script itself), but by making <Tab> and <LF> be the “internal field separators” will allow globbing with less quoting worries while still allowing for `files=$(ls)` constructs.
% echo test >'/tmp/hello world'
% cat /tmp/hello*
test
This is bash 5.3.9.
Still, I couldn't agree more on limiting IFS. Personally, I set it only to <LF>.
In my scripts, I rely on the $(ls) idiom heavily. People I've talked to consider this an anti-pattern and suggest relying on -0, -z, --zero, --null, and -print0 flags instead. I don't deny that it's better than nothing when correctness is the goal, but I’d counter that shell is more about using a familiar interface (text representation) to solve new tasks, not about writing correct code (that’s the domain of other languages). An uncritical pursuit of correctness often results in convoluted code.
(I know that $(ls) is a subject to various expansions. I solve this problem by using a shell that doesn't do that [1].)
Another consideration is that /bin/ls and /bin/find are not the only sources of filenames. Sometimes the source is third-party or has to be user-friendly (and thus separated by traditional newlines).
Some typographical issues just can't be solved by a pursuit of mechanistic correctness. For another example, the \.txt$ idiom wouldn't work if spaces are allowed at the end of filenames. Those problems are not even shell-specific.
Those are just a few of my personal notes. Fortunately, there's a more systematic and comprehensive study of this issue [2].
I had this issue too, so I remapped Ctrl-W/Shift-Ctrl-W to Ctrl-\/Shift-Ctrl-\ .
(Also git operations became two-key sequences, starting with Ctrl-G and that damn Ctrl-K stopped being the shortcut for commit.)
I always have been using em-dashes with specific spacing:
1. replacing parentheses —given that the em-dash in pairs for me mark more-relevant-to-the-main content than a parenthesized expression would— so I use the same spacing as `()`
2. replacing colon or just finishing the sentence with a subsentence— so the spacing goes like for a colon.
Probably unfounded grammatically and against any style guides, but this spacing makes sense to me.
reply