More

ofou · on April 24, 2025

considering the amount of bots in HN, not really that much

ofou · on March 28, 2025

mmm... like a whole Harry Potter series

ofou · on Feb 27, 2025

My most sincere love to all shadow libraries out there, you're doing god's work.

xtracto · on Feb 27, 2025

They do half of the work (which is a helluva lot)... the other half is done by the volunteers that digitize books.

I was looking at my country's "shelve" and it's so sad to see so many missing titles. I almost wanted to go to my local livrary and digitize sone of them. The old ones that are out of print and imposible to acquire right now...

So much knowledge lost.

FabHK · on Feb 27, 2025

To be fair, the authors of the books also contribute quite a bit.

ofou · on Feb 25, 2025

You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing

grg0 · on Feb 25, 2025

Open AI™ (with a space)

InkCanon · on Feb 25, 2025

There's hilariously nothing open about OpenAI, and that was the plan from the start. From the email by Ilya Sutsekver, OpenAI was always going to keep all it's research and code as proprietary information. Open supposedly meant the benefits would be shared. So they basically just became a SaaS with a free tier, like most of them. Musk was right when he called them out for fishing for money as if they were a non profit, but always had plans to become a company

danans · on Feb 25, 2025

> Musk was right when he called them out for fishing for money as if they were a non profit, but always had plans to become a company

I believe that he was right, because he of all people should recognize when someone is working from his own playbook of lies and misrepresentation.

Musk is pretty obviously upset because he got outfoxed and cut out of OpenAI, not because of some supposed ideal he holds about safe use of gen AI models.

hackit2 · on Feb 25, 2025

Kind of ironic that DeepSeek is more Open than ChatGPT

gostsamo · on Feb 25, 2025

They do it for their own reasons, but OpenAI are straight up liars and they are neither open nor give a fuck about humanity.

WiSaGaN · on Feb 25, 2025

It would be hilarious if this scenario played out.

OpenAI starts as a nonprofit, aiming to benefit all humanity. Eventually, they discover a path to AGI and engage in intense internal debates: Should they abandon their original mission and chase profit, knowing it could bring generational wealth? They ultimately decide, "To hell with humanity—let’s go for the money."

As they pivot to prioritizing profit, DeepSeek emerges. Staying true to OpenAI’s original vision, DeepSeek open-sources everything, benefiting humanity and earning global admiration. Unintentionally, this move tanks OpenAI’s valuation. In the end, OpenAI fails to become the hero or secure the massive profits they chased. Instead, they leave behind a legacy rebranded as "ClosedAI"

ghfhghg · on Feb 25, 2025

Admittedly I'm a sideline observer but it feels like the first half of your scenario is already happening (sans the agi).

yieldcrv · on Feb 25, 2025

"I don't want to live in a world where someone else is making the world a better place better than we are"

- Silicon Valley Season 2

chefandy · on Feb 25, 2025

OpenAyyyyI swear babe I’m gonna open it up any day. Yeah for that grated good or whatever it is you keep yappin about.

amelius · on Feb 25, 2025

Well, they do give us a great free tool to use, but that's where it ends and probably has some agenda behind it.

ur-whale · on Feb 25, 2025

> Kind of ironic that DeepSeek is more Open than ChatGPT

Not ironic at all.

You've simply be lied to by OpenAI.

Nothing ironic about being naive.

azinman2 · on Feb 25, 2025

Now. It’s amazing to me that everyone is like fuck OpenAI deepseek is the savior, when OpenAI’s papers and code jump started an AI revolution just a few years ago. Let’s wait the same number of years and see what deepseek does.

gertop · on Feb 25, 2025

I thought the papers that jump started the revolution came from Google?

larodi · on Feb 25, 2025

Indeed. And the papers were about doing better translation of char sequences, essentially the tech emerged as linguistics improvement for language. Then someone realised the parrot learns enough ZIP and JPEG alongside and can spit back hazy memories of it all.

the one still super useful thing OpenAI ever released must’ve been Whisper. But they could’ve been much more open for sure.

jeffreygoesto · on Feb 25, 2025

Hinton. And if you'd ask himself probably Schmidthuber.

echelon · on Feb 25, 2025

I hope you're reading this Sam Altman:

Make Open AI open.

Or else you'll lose to the ecosystem.

ta988 · on Feb 25, 2025

Too late, there is no more innovation from openai all the people that were the drivers left for Anthropic and the others. They had some of the biggest funding, had the advance... And yet they lost it.

ur-whale · on Feb 25, 2025

> I hope you're reading this Sam Altman

I hope he's not.

All he deserves at this point is to go down as hard as possible.

alpb · on Feb 25, 2025

That’s an impossible ask. Sam is the pinnacle of capitalist ruling class, he’s a pure businessman. He has no interest in giving anything for free unless there’s a business plan. He doesn’t care about humanity. He’ll pretend to change the world and tell you that they’re inventing AGI, Q*, strawberry or whatever they’re branding it, but the reality is he knows it’s all over and unless there’s a major breakthrough this company will be in major financial trouble. Sorry for the rant but he doesn’t deserve much respect for turning all this science to grift. He’s actually the person the old openai board warned everyone about.

anticensor · on Feb 25, 2025

Their state-of-the-art speech to text model, Whisper, is available as open weights for free.

echelon · on Feb 25, 2025

Strategically, they know that needs to run at the edge, and they want users to send them requests to their API without incurring latency or bad user experience.

That is still a fair point, though, and it should be commended. And that hasn't been their only contribution, either.

anticensor · on Feb 25, 2025

They could've made it a trusted-computing-only model distributed with a proprietary encryption, unlocked with an expensive licence key if they wanted.

sciencesama · on Feb 25, 2025

Sam is busy with his new kiddo

blackeyeblitzar · on Feb 25, 2025

Not really open source. For a truly open source model, check out OLMo 2 from AI2:

https://allenai.org/blog/olmo2

They literally share everything you need to recreate their model, including the data itself. This is what they say on that link above:

> Because fully open science requires more than just open weights, we are excited to share a new round of OLMo updates–including weights, data, code, recipes, intermediate checkpoints, and instruction–tuned models—with the broader language modeling community!

ofou · on Feb 16, 2025

https://www.emergentmind.com also offers Deep Research on ArXiv papers (experimental)

ofou · on Feb 13, 2025

This is one of the reasons I've been advocating to use UTF-8 as a tokenizer for a long time. The actual problem IMHO are tokenizers themselves, which obscure the encoding/decoding process in order to gain some compression during training to fit more data in for the same budget, and arguably gaining some better understanding from the beginning. Again just a lack of computing power.

If you use UTF-8 directly as tokenizer, this problem becomes evident once you fit it into the context window. Plus, you can run multiple tests for this type of injection; no emoji should take more than up to 40 bytes (10 code points * 4 bytes per code point in the worst case). This is an attack on tokenizers, not on UTF-8.

Plus, Unicode publishes the full list of sequences valid containing the ZWJ character in emoji-zwj-sequences.txt

ofou · on Feb 7, 2025

Who would have known that BitTorrent, shadow libraries, and seeders will help to train the best AI models out there, that adds a whole new meaning to a "seed".

ofou · on Feb 2, 2025

It’s called DeepSeek. The founder just confirmed a few days ago that he got the data from Anna's to train on, I think for their latest vision model.

ofou · on Feb 2, 2025

This is a wonderful submission to Anna's archive [1]. I really love people pushing the boundaries of shadow source initiatives that benefit all of us, especially providing great code and design. Can't emphasize enough the net plus of open source, BitTorrent, and shadow libraries that have had in the world. You can also make the case that LLMs wouldn't have been possible without shadow libraries; it's just no way of getting enough data to learn.

Just thank you.

https://software.annas-archive.li/AnnaArchivist/annas-archiv...

ofou · on Jan 31, 2025

I find quite interesting they're releasing three compute levels (low, medium, high), I guess now there's some way to cap the thinking tokens when using their API.

Pricing for o3-mini [1] is $1.10 / $4.40 per 1M tokens.

[1]: https://platform.openai.com/docs/pricing#:~:text=o3%2Dmini

HN For You