>Worst part is Europe was near the top of nuclear and microchips ~30 years ago
Boomers decided that was enough growth, it's time to cement what they got and cash out without thinking about economic growth opportunities of future generations.
>France had bleeding-edge AI
They still do, but it's all tied in stuffy bureaucratic French-speaking academia, not in monetizable products that scale internationally. Whatever they come up with, the US companies will then buy up, turn into products and sell for money.
The White House has Sachs and Krishnan and the CCP is full of engineers. In contrast, the EU Commission:
- Commissioner for Digital and Frontier: Henna Virkkunen (JOURNALIST, experience PR) [1]
- Executive Vice-President for Prosperity and Industrial Strategy: Stéphane Séjourné (LAWYER, politics, but hey, his mom was a telephone switch operator!) [2]
Olmo from AllenAI has been releasing their full pipelines including data [1]. A lot of it is just repackaged and resampled dumps from copyrighted data that has long been publicly available as dumps: Common Crawl, arxiv, Wikipedia, StackExchange, reddit --- all of which are presumably copyrighted with different licenses. Go in Huggingface and you can find massive multi TB data dumps used for pre training.
It is just as legal as when Uber and AirBNB were running illegal taxis and hotels during their growth phase. I'm just waiting for some corporate IP law firm to learn about Huggingface.
It's rather off-topic at this point, but I've never understood how HF can afford to be a CDN for such huge files. It seems like enterprise customers must be subsidizing a lot, but...at that point, is there not a cheaper alternative that doesn't subsidize every hobbyist and startup around?
> how HF can afford to be a CDN for such huge files
bandwidth and storage are literally free when compared to the cost of GPU clusters. HF gets rewarded heavily on capital market for being in AI without actually doing much AI stuff, that is a huge win when compared to costs they are paying for bandwidth and storage.
> I'm just waiting for some corporate IP law firm to learn about Huggingface.
Presumably they already know. The issue is that IP law firms are tiny compared to the trillions of capital pouring into "AI". And if you believe the USA is a capitalist country where the side with deeper pockets win, you know you're not going to win against the trillionaires.
Open-source data coverage: The released datasets cover an estimated 8–10T tokens
(~40–50% of the internal 25T blend). Missing categories include code (~14% of blend),
nemotron-cc-code (~2%), crawl++ (~2%), and academic text (~2%). Users should
supplement with their own data for these categories and adjust train_iters
accordingly.
Nemotron is the strongest model (on most benchmarks) that has its full training pipeline and most of the data open. Olmo 3 from AllenAI, and K2 Think V2 from Mohamed bin Zayed University of Artificial Intelligence are both fully open, but not as capable as the Nemotron family. Granite has much of the training pipeline and data open, but is missing some of each.
DeepSeek API gave 6x to 8x better caching rate for inputs over OpenRouter (even chosing DeepSeek as provider). And some of the cheaper providers are using FP4 quantizations.
After complaints the cached read is not listed anymore in that page, you have to click one by one. All providers for DeepSeek V4 Flash charge ~$0.02 while DeepSeek provider is $0.0028. For coding this is huge as caching often gets in the range of 90 to 99%. But OpenRouter messes your caching so don't use it. And it seems to be a VC-backed closed middle-man company, not open source or open anything.
Openrouter's pricing via the deepseek provider is the same as the official deepseek api for both flash and pro and for cached and uncached tokens. It's literally the same api.
And no, cache rates are not different if you're going through the official deepseek provider. The only way caching rates can drop is if you let openrouter fully control routing by preferring uptime or something, and then it might bounce you between providers. But you can control which providers for a given model are in its routing pool and stop that.
GP is exaggerating but I am convinced this will happen sooner rather than later. The improvements in AI are truly exponential if you read the SOTA papers. It's hard to keep up week to week.
Now they will sure have a bunch of summits at fancy places flying first class and conclude them by issuing strong inconsequential statements and allocating funds with billions earmarked to be wasted by their pals (revolving door).
Absolutely. Look at every current member of the EU commission, European Investment Bank and European Central Bank. Almost without exception, almost everyone has a scandal behind them.
Its how you get these jobs, you need to prove you have the pedigree.
Worst part is Europe was near the top of nuclear and microchips ~30 years ago, and France had bleeding-edge AI.
reply