DES wasn’t common place though (or at least not on the mainframes I worked on). But maybe than says more about the places I worked early on in my career?
Also DES is trivial to crack because it has a short key length.
Longer keys require more compute power and thus the system requirements to handle encryption increase as the hardware to decrypt becomes more powerful.
The box size at IBM was larger before standardisation. DES is trivial to break, because of NSA involvement in weakening all the corners. [0]
> In the development of the DES, NSA convinced IBM that a reduced key size was sufficient;
Minitel used DES, and other security layers, and was in use for credit cards, hospitals, and a bunch of other places. The "French web" very nearly succeeded, and did have these things in '85. It wasn't just mainframes - France gave away Minitel terminals to the average household.
Yeah I’d written about minitel in a tech journal several years back. It’s a fascinating piece of technology but safely never got to see one in real life.
I worked for one payroll mainframe in the 80s that didn’t have DES. So it wasn’t quite as ubiquitous as you might think. But it does still sound like it was vastly more widespread than I realised too.
I genuinely feel like there is something happening where hackernews articles come in bunch/reference-to-each-other :]
So one of the comments on one hackernews post on front-page almost somehow always refer to something within a hackernews post on the same front-page. I have seen this witnessed too many times that it might be time to name this phenomenon.
Freund seems to suggest that hugepages is the right way to run a system under this sort of load - which is the fix.
> Hah. I had reflexively used huge_pages=on - as that is the only sane thing to do with 10s to 100s of GB of shared memory and thus part of all my benchmarking infrastructure - during the benchmark runs mentioned above.
> Turns out, if I disable huge pages, I actually can reproduce the contention that Salvatore reported (didn't see whether it's a regression for me though). Not anywhere close to the same degree, because the bottleneck for me is the writes.
Using C++ under Clang 17 and later (possibly earlier as well, I haven't checked) std::numeric_limits<T>::is_iec559 comes back as true for me for x86_64 on Debian as well as when compiling for Emscripten. Might it be due to your compiler flags? Or is this somehow related to a C/C++ divergence?
If I am not mistaken, is_iec559 concerns numerical representation, while __STDC_IEC_559__ is broader, and includes the behavior of numerical operations like 1.0/-0.0 and various functions.
The standard warns that macros and assertions can return true for this one, even if it isn't actually true. The warning, because that's what compilers currently do.
Its one of the caveats of the C-family that developers are supposed to be aware of, but often aren't. It doesn't support IEEE 754 fully. There is a standard to do so, but no one has actually implemented it.
Of course in my case what I'm actually concerned with is the behavior surrounding inf and NaN. Thankfully I've never been forced to write code that relied on subtle precision or rounding differences. If it ever comes up I'd hope to keep it to a platform independent fixed point library.
CPPReference is not the C++ standard. Its a wiki. It gets things wrong. It doesn't always give you the full information. Probably best not to rely on it, for things that matter.
But, for example, LLVM does not fully support IEEE 754 [0].
And nor does GCC - who list it as unsupported, despite defining the macro and having partial support. [1]
The biggest caveat is in Annex F of the C standard:
> The C functions in the following table correspond to mathematical operations recommended by IEC 60559. However, correct rounding, which IEC 60559 specifies for its operations, is not required for the C functions in the table.
The C++ standard [2] barely covers support, but if a type supports any of the properties of ISO 60559, then it gets is_iec559 - even if that support is _incomplete_.
This paper [3] is a much deeper dive - but the current state for C++ is worse than C. Its underspecified.
> When built with version 18.1.0 of the clang C++ compiler, without specifying any compiler options, the output is:
> distance: 0.0999999
> proj_vector_y: -0.0799999
> Worse, if -march=skylake is passed to the clang C++ compiler, the output is:
I do, yes. I check that the compiler reports the desired properties and in cases where my code fails to compile because it does not I special case and manually test each property my code depends on. In my case that's primarily mantissa bit width for the sake of various utility functions that juggle raw FP bits.
Even for "regular" architectures this turns out to be important for FP data types. Long double is an f128 on Emscripten but an f80 on x86_64 Clang, where f128 is provided as __float128. The last time I updated my code (admittedly quite a while ago) Clang version 17 did not (yet?) implement std::numeric_limits support for f128.
Honestly there's no good reason not to test these sorts of assumptions when implementing low level utility functions because it's the sort of stuff you write once and then reuse everywhere forever.
And as I wrote, "There's platform and there's platform."
I don't support the full range of platforms that C supports. I assume 8 bit chars. I assume good hardware support for 754. I assume the compiler's documentation is correct when it says it map "double" to "binary64" and uses native operations. I assume if someone else compiles my code with non-754 flags, like fused multiply and add, then it's not a problem I need to worry about.
For that matter, my code doesn't deal with NaNs or inf (other than input rejection tests) so I don't even need fully conformant 754.
I say nothing about code which can support when char is 64-bit because my entire point was that my definition of "platform" is far more restrictive than C's, and apparently yours.
You wrote "I generally include various static asserts about basic platform assumptions."
I pointed out "There's platform and there's platform.", and mentioned that I assume POSIX.
So of course I don't test for CHAR_BIT as something other than 8.
If you want to support non-POSIX platform, go for it! But adding tests for every single one of the places where the C spec allows implementation defined behavior and where all the compilers I used have the same implementation defined behavior and have done so for years or even decades, seems quixotic to me so I'm not doing to do it.
And I doubt you have tests for every single one of those implementation-defined platform assumptions, because there are so many of them, and maintaining those tests when you don't have access to a platform with, say, 18-bit integers to test those tests, seems like it will end up with flawed tests.
> maintaining those tests when you don't have access to a platform with, say, 18-bit integers to test those tests, seems like it will end up with flawed tests.
No? I don't over generalize for features I don't use. I test to confirm the presence of the assumptions that I depend on. I want my code to fail to compile if my assumptions don't hold.
I don't recall if I verify CHAR_BIT or not but it wouldn't surprise me if I did.
In 1760, The Natural and Civil History of the French Dominions in North and South America did absolutely claim that there was some papal decree that otter tail was fish, and beaver was fish, and so on.
But... There's no actual Papal decree, bull, or otherwise in canon law that anyone can find. It's just a good story, not a true one.
Which doesnt matter. What maters is whether melville thought it to be true when he wrote the line. The joke/reference would have been understood by readers at the time regardless of whether it was factually true.
2010. Archibishop of New Orleans. Alligator is "fish". Whether or not the pope has an opinion, such things are not fiction.
YouGov doesn't poll council elections. Nobody does. There's no commercial incentive to poll 5,000 individual wards at £10-50 per respondent.
That's the gap.
YouGov's MRP models predict at constituency level for general elections. We predict at council ward level for local elections.
Different product, different market.
On cost: 65,000 synthetic respondents cost us roughly £35 in GPU compute. A 1,000-person
YouGov poll costs £5,000-15,000. The accuracy isn't the same yet (we're at 75% winner accuracy on by-elections vs YouGov's 90%+ on generals), but we're predicting contests nobody else attempts.
The "in" is that local elections, by-elections, and ward-level prediction are completely unserved.
We're not replacing YouGov. We're filling the space below where polling is commercially viable.
And more than anything, we're testing and learning in public. If our panels get close to the real result after 7th May election, then there is something here... if not, well... I accept all your criticism :D
> YouGov doesn't poll council elections. Nobody does.
... Yes. They do.
> This is the smallest set of elections in the local authority cycle, but nonetheless throws up some intriguing contests. For the second year running, YouGov have used MRP to project the results of some key local authority battlegrounds.
You are right, I overstated that. YouGov did publish MRP projections for selected councils in 2024 and 2025.
The distinction I should have drawn is that MRP projects from national polling data down to local level using demographic models. It does not field ward-level samples. Our approach generates ward-level responses from synthetic personas with individual personality profiles.
Different method, overlapping output. Fair correction.
So you'd disagree with style that Linux uses for their commits?
Random example:
Provide a new syscall which has the only purpose to yield the CPU after the
kernel granted a time slice extension.
sched_yield() is not suitable for that because it unconditionally
schedules, but the end of the time slice extension is not required to
schedule when the task was already preempted. This also allows to have a
strict check for termination to catch user space invoking random syscalls
including sched_yield() from a time slice extension region.
I think my post makes it pretty clear that I would. If you want, I could cite several examples of organizations which use the method I described, so you can weigh it against the one example you provided, and get the full picture.
In your example, for example, where was the issue tracked before the code was written? The format you linked makes it difficult to get the history of the issue.
Let me ask you this: suppose you have a task that needs to be done eventually, and you want to write down some ideas for it, but don't want to start coding right now. Where do you put those ideas? How do you link them to that specific task?
Everyone has its own system although companies do tend to codify it with a project manager. I used TODO.txt inside the repo. an org file, Things.app, a stack of papers, and a whiteboard. But once a task is done, I can summarize the context in a paragraph or two. That’s what I put in the commits.
Git was built for email, because that's the system Linux uses. Commits appear inline. Diffs are reviewed and commented inline.
Email is the review process, and commits contain enough information that git blame can get you a reasoning - it doesn't require you checking the email archive. Rather than a dead ticket that no longer exists.
I can also supply you a list of companies that make use of git's builtin features if you like. But thats probably not relevant to discussing management techniques.
I’m not a lawyer, but I read the decision, and how is this section not a ruling on fair use?
“To summarize the analysis that now follows, the use of the books at issue to train Claude
and its precursors was exceedingly transformative and was a fair use under Section 107 of the
Copyright Act. And, the digitization of the books purchased in print form by Anthropic was
also a fair use but not for the same reason as applies to the training copies. Instead, it was a
fair use because all Anthropic did was replace the print copies it had purchased for its central
library with more convenient space-saving and searchable digital copies for its central
library — without adding new copies, creating new works, or redistributing existing copies.
However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.”
Or in the final judgement, “This order grants summary judgment for Anthropic that the training use was a fair use.
And, it grants that the print-to-digital format change was a fair use for a different reason.”
> it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library
It is only fair use where Anthropic had already purchased a license to the work. Which has zero to do with scraping - a purchase was made, an exchange of value, and that comes with rights.
The second, which involves a section of the judgement a little before your quote:
> And, as for any copies made from central library copies but not used for training, this order does not grant summary judgment for Anthropic.
This is where the court refused to make any ruling. There was no exchange of value here, such as would happen with scraping. The court made no ruling.
I believe you are misinterpreting the ruling. Remember that a copyright claim must inherently argue that copies of the work are being made. To that end, the case analyzes multiple "copies" alleged to have been made.
1) "Copies used to train specific LLMs", for which the ruling is:
> The copies used to train specific LLMs were justified as a fair use.
> Every factor but the nature of the copyrighted work favors this result.
> The technology at issue was among the most transformative many of us will see in our lifetimes.
Notable here is that all of the "copies used to train specific LLMs" were copies made from books Anthropic purchased. But also of note is that Anthropic need not have purchased them, as long as they had obtained the original sources legally. The case references the Google Books lawsuit as an example of something Anthropic could have done to avoid pirating the books they did pirate where in Google obtained the original materials on loan from willing and participating libraries, and did not purchase them.
2) "The copies used to convert purchased print library copies into digital library copies", where again the ruling is:
> justified, too, though for a different fair use. The first factor strongly
> favors this result, and the third favors it, too. The fourth is neutral. Only
> the second slightly disfavors it. On balance, as the purchased print copy was
> destroyed and its digital replacement not redistributed, this was a
> fair use.
Here one might argue where the use of GPL code is different in that in making the copy, no original was destroyed. But it's also very likely that this wouldn't apply at all in the case of GPL code because there was also no original physical copy to convert into a digital format. The code was already digitally available.
3) "The downloaded pirated copies used to build a central library" where the court finds clearly against fair use.
4) "And, as for any copies made from central library copies but not used for training" where as you note Judge Alsup declined to rule. But notice particularly that this is referring to copies made FROM the central library AND NOT for the purposes of training an LLM. The copies made from purchased materials to build the central library in the first place were already deemed fair use. And making copies from the central library to train an LLM from those copies was also determined to be fair use.The copies obtained by piracy were not. But for uses not pertaining to the training of an LLM, the judge is declining to make a ruling here because there was not enough evidence about what books from the central library were copied for what purposes and what the source of those copies was. As he says in the ruling:
> Anthropic is not entitled to an order blessing all copying “that Anthropic has ever made after obtaining the data,” to use its words
This declination applies both to the purchased and pirated sources, because it's about whether making additional copies from your central library copies (which themselves may or may not have been fair use), automatically qualifies as fair use. And this is perfectly reasonable. You have a right as part of fair use to make a copy of a TV broadcast to watch at a later time on your DVR. But having a right to make that copy does not inherently mean that you also have a right to make a copy from that copy for any other purposes. You may (and almost certainly do) have a right to make a copy to move it from your DVR to some other storage medium. You may not (and almost certainly do not) have a right to make a copy and give it to your friend.
At best, an argument that GPL software wouldn't be covered under the same considerations of fair use that this case considers would require arguing that the copies of GPL code obtained by Anthropic were not obtained legally. But that's likely going to be a very hard argument to make given that GPL code is freely distributed all over the place with no attempts made to restrict who can access that code. In fact, GPL code demands that if you distribute the software derived from that code, you MUST make copies of the code available to anyone you distribute the software to. Any AI trainer would simply need to download Linux or emacs and the GPL requires the person they downloaded that software from to provide them with the source code. How could you then argue that the original source from which copies were made was obtained illicitly when the terms of downloading the freely available software mandated that they be given a copy?
> How could you then argue that the original source from which copies were made was obtained illicitly when the terms of downloading the freely available software mandated that they be given a copy?
By the license and terms such copies are under.
> For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
You _must_ show the terms. If you copy the GPL code, and it inherits the license, as the terms say it does, then you must also copy the license.
The GPL does not give you an unfettered right to copy, it comes with terms and conditions protecting it under contract law. Thus, you must follow the contract.
The GPL goes to some lengths to define its terms.
> A "covered work" means either the unmodified Program or a work based on the Program.
> Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well.
It is not just the source code that you must convey.
Which clause of the GPL requires the receiver of GPL code to agree to the terms of the GPL before being allowed to receive the source code that they are entitled to under the license? Because that would expressly contradict the first sentence of section 9:
You are not required to accept this License in order to receive or run a copy of the Program.
Isn't that one of the key points to the GPL? That the provisions of it only apply to you IF you decide to distribute GPL software but that they do not impose any restrictions on the users of the software? Surely you're not suggesting that anyone who has ever seen the source code of a GPLed piece of software is permanently barred from contributing to or writing similar software under a non-GPL license simply by the fact that they received the GPLed source code.
> If you copy the GPL code, and it inherits the license, as the terms say it
> does, then you must also copy the license.
> The GPL does not give you an unfettered right to copy, it comes with terms
> and conditions protecting it under contract law. Thus, you must follow the
> contract.
I agree that the GPL does not give you an unfettered right to copy. But the GPL like all such licenses are still governed by copyright law. And "fair use" is an exception to the copyright laws that allow you to make copies that you are not otherwise authorized to make. No publisher can put additional terms in their book, even if they wrap it in shrinkwrap that denies you the right to use that book for various fair use purposes like quoting it for criticism or parody. The Sony terms and conditions for the Play Station very clearly forbid copying the BIOS or decompiling it. But those terms are null and void when you copy the BIOS and decompile it for making a new emulator (at least in the US) because the courts have already ruled that such use is fair use.
So it is with the GPL. By default you have no right to make copies of the software at all. The GPL then grants you additional rights you normally wouldn't have under copyright law, but only to the extent that when exercising those rights, you comply with the terms of the GPL. But "Fair Use" then goes beyond that and says that for certain purposes, certain types and amounts of copies can be made, regardless of what rights the publisher does or does not reserve. This would be why the GPL specifically says:
This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.
Fair use (and its analogs in other countries) supersede the GPL. And even the GPL FAQ[1] acknowledges this fact:
Do I have “fair use” rights in using the source code of a GPL-covered program? (#GPLFairUse)
Yes, you do. “Fair use” is use that is allowed without any special
permission. Since you don't need the developers' permission for such use, you
can do it regardless of what the developers said about it—in the license or
elsewhere, whether that license be the GNU GPL or any other free software
license.
DES was standardised in '77. In use, before that. SSL was not the first time the world adopted encrypted protocols.
The NSA wouldn't have weakened the standard, it was something nobody used.
reply