For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more ofou's commentsregister

considering the amount of bots in HN, not really that much


mmm... like a whole Harry Potter series


My most sincere love to all shadow libraries out there, you're doing god's work.


They do half of the work (which is a helluva lot)... the other half is done by the volunteers that digitize books.

I was looking at my country's "shelve" and it's so sad to see so many missing titles. I almost wanted to go to my local livrary and digitize sone of them. The old ones that are out of print and imposible to acquire right now...

So much knowledge lost.


To be fair, the authors of the books also contribute quite a bit.


You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing


Open AI™ (with a space)


There's hilariously nothing open about OpenAI, and that was the plan from the start. From the email by Ilya Sutsekver, OpenAI was always going to keep all it's research and code as proprietary information. Open supposedly meant the benefits would be shared. So they basically just became a SaaS with a free tier, like most of them. Musk was right when he called them out for fishing for money as if they were a non profit, but always had plans to become a company


> Musk was right when he called them out for fishing for money as if they were a non profit, but always had plans to become a company

I believe that he was right, because he of all people should recognize when someone is working from his own playbook of lies and misrepresentation.

Musk is pretty obviously upset because he got outfoxed and cut out of OpenAI, not because of some supposed ideal he holds about safe use of gen AI models.


Kind of ironic that DeepSeek is more Open than ChatGPT


They do it for their own reasons, but OpenAI are straight up liars and they are neither open nor give a fuck about humanity.


It would be hilarious if this scenario played out.

OpenAI starts as a nonprofit, aiming to benefit all humanity. Eventually, they discover a path to AGI and engage in intense internal debates: Should they abandon their original mission and chase profit, knowing it could bring generational wealth? They ultimately decide, "To hell with humanity—let’s go for the money."

As they pivot to prioritizing profit, DeepSeek emerges. Staying true to OpenAI’s original vision, DeepSeek open-sources everything, benefiting humanity and earning global admiration. Unintentionally, this move tanks OpenAI’s valuation. In the end, OpenAI fails to become the hero or secure the massive profits they chased. Instead, they leave behind a legacy rebranded as "ClosedAI"


Admittedly I'm a sideline observer but it feels like the first half of your scenario is already happening (sans the agi).


"I don't want to live in a world where someone else is making the world a better place better than we are"

- Silicon Valley Season 2


OpenAyyyyI swear babe I’m gonna open it up any day. Yeah for that grated good or whatever it is you keep yappin about.


Well, they do give us a great free tool to use, but that's where it ends and probably has some agenda behind it.


> Kind of ironic that DeepSeek is more Open than ChatGPT

Not ironic at all.

You've simply be lied to by OpenAI.

Nothing ironic about being naive.


Now. It’s amazing to me that everyone is like fuck OpenAI deepseek is the savior, when OpenAI’s papers and code jump started an AI revolution just a few years ago. Let’s wait the same number of years and see what deepseek does.


I thought the papers that jump started the revolution came from Google?


Indeed. And the papers were about doing better translation of char sequences, essentially the tech emerged as linguistics improvement for language. Then someone realised the parrot learns enough ZIP and JPEG alongside and can spit back hazy memories of it all.

the one still super useful thing OpenAI ever released must’ve been Whisper. But they could’ve been much more open for sure.


Hinton. And if you'd ask himself probably Schmidthuber.


I hope you're reading this Sam Altman:

Make Open AI open.

Or else you'll lose to the ecosystem.


Too late, there is no more innovation from openai all the people that were the drivers left for Anthropic and the others. They had some of the biggest funding, had the advance... And yet they lost it.


> I hope you're reading this Sam Altman

I hope he's not.

All he deserves at this point is to go down as hard as possible.


That’s an impossible ask. Sam is the pinnacle of capitalist ruling class, he’s a pure businessman. He has no interest in giving anything for free unless there’s a business plan. He doesn’t care about humanity. He’ll pretend to change the world and tell you that they’re inventing AGI, Q*, strawberry or whatever they’re branding it, but the reality is he knows it’s all over and unless there’s a major breakthrough this company will be in major financial trouble. Sorry for the rant but he doesn’t deserve much respect for turning all this science to grift. He’s actually the person the old openai board warned everyone about.


Their state-of-the-art speech to text model, Whisper, is available as open weights for free.


Strategically, they know that needs to run at the edge, and they want users to send them requests to their API without incurring latency or bad user experience.

That is still a fair point, though, and it should be commended. And that hasn't been their only contribution, either.


They could've made it a trusted-computing-only model distributed with a proprietary encryption, unlocked with an expensive licence key if they wanted.


Sam is busy with his new kiddo


Not really open source. For a truly open source model, check out OLMo 2 from AI2:

https://allenai.org/blog/olmo2

They literally share everything you need to recreate their model, including the data itself. This is what they say on that link above:

> Because fully open science requires more than just open weights, we are excited to share a new round of OLMo updates–including weights, data, code, recipes, intermediate checkpoints, and instruction–tuned models—with the broader language modeling community!


https://www.emergentmind.com also offers Deep Research on ArXiv papers (experimental)


This is one of the reasons I've been advocating to use UTF-8 as a tokenizer for a long time. The actual problem IMHO are tokenizers themselves, which obscure the encoding/decoding process in order to gain some compression during training to fit more data in for the same budget, and arguably gaining some better understanding from the beginning. Again just a lack of computing power.

If you use UTF-8 directly as tokenizer, this problem becomes evident once you fit it into the context window. Plus, you can run multiple tests for this type of injection; no emoji should take more than up to 40 bytes (10 code points * 4 bytes per code point in the worst case). This is an attack on tokenizers, not on UTF-8.

Plus, Unicode publishes the full list of sequences valid containing the ZWJ character in emoji-zwj-sequences.txt


Who would have known that BitTorrent, shadow libraries, and seeders will help to train the best AI models out there, that adds a whole new meaning to a "seed".


It’s called DeepSeek. The founder just confirmed a few days ago that he got the data from Anna's to train on, I think for their latest vision model.


This is a wonderful submission to Anna's archive [1]. I really love people pushing the boundaries of shadow source initiatives that benefit all of us, especially providing great code and design. Can't emphasize enough the net plus of open source, BitTorrent, and shadow libraries that have had in the world. You can also make the case that LLMs wouldn't have been possible without shadow libraries; it's just no way of getting enough data to learn.

Just thank you.

https://software.annas-archive.li/AnnaArchivist/annas-archiv...


I find quite interesting they're releasing three compute levels (low, medium, high), I guess now there's some way to cap the thinking tokens when using their API.

Pricing for o3-mini [1] is $1.10 / $4.40 per 1M tokens.

[1]: https://platform.openai.com/docs/pricing#:~:text=o3%2Dmini


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You