I think OpenAI and Anthropic just downloaded the same torrents from Anna's Archive that anyone else can. But it's only OK when they do it. The rest of us get nastygrams from law offices. Anthropic actually had to cough up some bucks, for that matter.
At that point, a lot depends on the quality of the preprocessing applied to the raw text dumps. It is reportedly not that trivial to go from DumpOfSketchyRussianPirateSite.zip to a data set suitable for ingestion during pretraining. A few bad chunks of data can apparently do more harm than one would expect.
AFAIK Google scans almost everything in print as part of the Google Books initiative, so they may have been able to skip the torrenting step.
> I think OpenAI and Anthropic just downloaded the same torrents from Anna's Archive
I think you're underestimating how much chat conversation data they've gathered at this point, and how much of it is part of the training set.
None of that is available to anyone who wants to train a frontier model.
And when it comes to Google ... the hoard of data they're sitting on goes back to what? 1998? They've basically got a digital record of what happened since the birth of the internet.
> This is not gain at all. At least in theory: You own some tons of gold at the start of the process, you have the same tons of gold at the end of the process.
Correct. A better way to put it is you shorted the USD. Which is a smart move at any rate. So a gain indeed.
Just like the majority of the classical economists and policymakers, you would call him a blithering idiot and overzealous nationalist two decades ago. It was thought that this kind of behavior caused world-wars. I mean it did cause them. It is just we're speed running the next one that changed the narrative.
I think many academics are often specialized in one area of their expertise and overfit in that dimension. Journalists pick this up and promote those views a bit too much. This results in non-optimal decisions due to skewed public perceptions.
We need to promote holistic thinking considering multiple dimensions and not just one where academics are proficient in.
> many academics are often specialized in one area of their expertise and overfit in that dimension
An economist saying a national-security measure costs this much is fine. Where it goes off the rails is in turning costs into damnation without accounting for what one gets in return. In an attention-driven media environment, that sells.
The problem is that there isn't simply an efficient solution for everything. At one point every problem has solutions with pros and cons
France could do it as it is a rich and big country but smaller countries do not have a viable choice. This reasoning could have been applied to France too in another universe.
It's a balance impossible to totally tilt one way or another.
So no amount of extra information could help when it's matter of opinion at the end of the day
"Amazing" is however not the word I would use though, the UI is still very convoluted and very hard to learn.
The worst part in FreeCAD, and which remains true to this day is the load of minutia you need to know to handle/avoid weird corner cases that you inevitable run into when you start building complex models and where FreeCAD stubbornly refuses to let you carry on with your work.
When you paint yourself into one of these corners, the software is hugely unhelpful when it comes to understanding what you did wrong and how to correct it.
In short, the word "Amazing" only works if you compare it to the absolute abomination the UI was a few years back.
But compare FreeCAD today to, for example, how slick Fusion is, there is still a very, very wide gap.
Finally, the geometry engine, is a somewhat old and creaky thing that sometimes downright fails to compute fillets or surface/surface intersections correctly, so yeah, YMMV.
FreeCAD is however, free software, and not controlled by one of the worst corp. in the world of software: Autodesk. So huge thumbs up there.
This is really accurate to my experience learning FreeCAD earlier this year. I am a former professional CAD user (of a lesser software than AutoCAD) and I don't think I would have gotten far without being able to ask ChatGPT for help understanding some of the quirks of FreeCAD.
For free and open it's truly impressive though. Actually I think my time building iOS UIs in Storyboard was at least as useful as previous CAD experience, since constraints are the foundation of (at least one approach to) designing parts.
The last Autodesk software I've used was AutoCAD 2000 (released in 1999). And I've not followed them since.
Perhaps they have indeed become "one of the worst corp. in the world of software", but in the early years they were very interesting. The founder of Autodesk, John Walker (he died in 2024) wrote/edited and interesting book on the early years: "The Autodesk File" https://fourmilab.ch/autofile/
Statement of fact with my interpretation --- folks should verify the fact and read what he has written and come to their own conclusions.
While I'm grateful Autodesk stepped in and kept TinkerCAD afloat, I'm relieved Sketchbook escaped their clutches, and am glad I never got involved in Fusion 360 so as to suffer from their on-going "rug pulls" --- which of these are a result of his influence, I've not found a need to discern.
> I think OpenSCAD is currently the best and most feature complete choice
As much as I love OpenSCAD, I would strongly disagree with your conclusion.
All the OpenSCAD language can do is boolean operations and moreover, the engine can only implement those on polygonal (triangle actually) meshes.
That's a very far cry from what a modern commercial CAD engine can do.
For example, the following things are very, very hard to do, or even specify using OpenScad:
- Smooth surfaces, especially spline-based
- Fillets / Chamfers between two arbitrary surfaces
- Trimming surfaces
- Querying partly built models and using the outcome in the subsequent construction (e.g. find the shortest segment between two smooth surfaces, building a cylinder around it and filleting it with the two surfaces, this is an effing nightmare to do within the confines of OpenSCAD)
- Last but not least: there is no native constraint solver in OpenSCAD, neither in the language nor in the engine (unlike - say - SolveSpace)
I might have misunderstood what you're looking to do, but, yeah, digging deeper feels very much like the right thing to do.
using BOSL2 alleviates most issues I've run into with OpenScad for chamfers and the like, but it is an extra set of functions you need to remember sadly
> BOSL2 ... but it is an extra set of functions you need to remember sadly
It's also extremely slow: it implements chamfers and fillets using morpho, and if you have a large number of fillets, the morpho algorithms (minkowski / hull) are very much non linear in time on polygonal meshes, which leads to compute time explosion if you want a visually smooth result.
I'm talking about the training set.
Sure there are some open sets out there.
But my guess is they are nowhere near what OpenAI, Google and Anthropic are actually using.
Happy to be proven wrong.
reply