For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | clueless's commentsregister

see the show Ozark on Netflix

This is unrelated to this article, but I see such simple titles posted on HN often and given how many articles I read per day on HN, I don't know if it's worth me reading or not until I click it. I wish we had a feature on HN that semantically defined who the intended audience for an article is, specially for such opaque titles. Something like the following (used gemini for this):

Here are the 1-2 tags defining the intended audience for each article on the front page:

Five frontier LLMs disagree on 67% of 1k real-world fact-check claims Tags: AI Researchers, Machine Learning Engineers

YouTube to automatically label AI-generated videos Tags: Digital Content Creators, General Tech Consumers

A Eureka machine that thinks like nature and explores what AI cannot Tags: Computer Scientists, AI Researchers

AMD pulls a bait-and-switch on Linux users with Vivado licensing changes Tags: Linux Users, Hardware Engineers

I analysed 20 years of my chats Tags: Data Enthusiasts, Hobbyist Programmers

I think Anthropic and OpenAI have found product-market fit Tags: Tech Entrepreneurs, Product Managers

Hallucinate – Massively Multiplayer Online Rave Tags: Gamers, Creative Coders

AI sticker shock hits corporate America Tags: Corporate Executives, IT Managers

SimCity 3k in 4k (2025) Tags: Retro Gamers, Game Developers

Rapira (Рапира) – Soviet programming language interpreter Tags: Programming Historians, Language Enthusiasts

What Apple and Google are doing to push notifications Tags: Mobile Developers, Privacy Advocates

Commission fines Temu €200M for breaching the Digital Services Act Tags: E-commerce Professionals, Tech Policy Analysts

Ruby vs. Java vs. TypeScript: my experience on building a Cowork DOCX plugin Tags: Software Engineers, Web Developers

I'm Getting into Mesh Networks (Meshtastic, MeshCore, and Reticulum) Tags: Network Enthusiasts, Maker/DIY Community

More Whimsical OEIS Sequences Tags: Mathematicians, Recreational Math Enthusiasts

Libwce: The entropy layer of a wavelet codec, on its own Tags: Compression Engineers, Systems Programmers

The Ask (the article you previously asked about) Tags: Engineering Managers, Tech Leaders

Seeing Around Corners Using Smartphone-Grade Lidar Tags: Computer Vision Researchers, Optics Engineers

Rust (and Slint) on a Jailbroken Kindle Tags: Hardware Hackers, Rust Developers

DuckDuckGo search saw 28% more visits after Google said people love AI mode Tags: Search Engine Marketers, Privacy Advocates

Investigating how prompt politeness affects LLM accuracy (2025) Tags: AI Prompt Engineers, NLP Researchers

Go: Support for Generic Methods Tags: Go Developers, Systems Programmers

Biff is a command line datetime Swiss army knife Tags: System Administrators, CLI Power Users

FBI Arrests CIA Official with $40M in Gold Bars in His Home Tags: General Audience, Intelligence Buffs

RamAIn (YC W26) Is Hiring Tags: Job Seekers, AI Engineers

Warm up your MacBook (2019) Tags: Mac Users, Hardware Hobbyists

Incident with Pull Requests, Issues, Git Operations and API Requests (GitHub) Tags: DevOps Engineers, Software Developers

A New Typst Template for Pandoc (2025) Tags: Academic Writers, Technical Writers

Stress disrupts hippocampal integration of overlapping events, memory inference Tags: Neuroscientists, Psychology Researchers

Google employee charged with $1M Polymarket insider trading bet on search term Tags: Tech Finance Enthusiasts, General Tech Consumers


lobste.rs is similar to hn, just around the corner, and does something like this.

This sounds like a great capability to be added to immich


Or Stash lol


Super confusing... seems like some sort of in with the VCs that can pull this program's guests was enough to create a new podcast that is now seen as influential. My best is, this was a side liquidity event for the openAI VCs that had somehow invested into the podcast, looking to get some money out of openAI stake.


I like this theory for no other reason than it seems plausible lol


60K followers on youtube for low hundreds of millions? seems steep


guarantee one of them caught an OpenAI guy murdering a prostitute or something


> It's only true in a universe where Iran would have collapsed from within before the expiration of the sunset clause, and that clearly was not going to happen.

No one can know this hypothetical, but some def bet their entire futures/careers on this: that an Iran with a more prosperous middle class (as a result of JCPOA) might have had a better chance for social/internal reform, i.e. regime change.


> If you take a lot of chances, that adds up eventually and you'll have some big wins. Just do it safely, so that they don't add up to a lot of big losses, too.

And here is great contradiction in this whole essay. You can't "safely" take a lot of chances and not lose big, when in most cases to have big wins, one has to do unsafe things...

This is also why folks who have a safety net (in terms of family wealth, etc) tend to do better as entrepreneurs. Not sure this essay is helpful.


Step 1 have resources, Step 2 boot strap yourself.

If you really want to succeed, you need to pick the best parents.


What are some sample real world cases folks are using to fine tune their own small/medium models?


Oh I wrote up a post on X on this exact question! https://x.com/danielhanchen/status/1979389893165060345?s=20

1. Cursor used online RL to get +28% approval rate: https://cursor.com/blog/tab-rl

2. Vercel used RFT for their AutoFix model for V0: https://vercel.com/blog/v0-composite-model-family

3. Perplexity's Sonar for Deep Research Reasoning I think was a finetuned model: https://docs.perplexity.ai/docs/getting-started/overview

4. Doordash uses LoRA, QLoRA for a "Generalized Attribute Extraction model" https://careersatdoordash.com/blog/unleashing-the-power-of-l...

5. NASA flood water detection https://earthdata.nasa.gov/news/nasa-ibm- openly-release-geospatial-ai-foundation-model-nasa-earth-observation-data6

6. Online RL for robotics - imagine you teaching a robot in the future via some mini finetuning

7. OpenAI's RFT page has more: https://developers.openai.com/api/docs/guides/rft-use-cases

8. For larger models - https://www.mercor.com/blog/expert-data-drives-model-perform...


Only to prompt thought on this exact question, im interested in answers:

I just ran a benchmark against haiku of a very simple document classification task that at the moment we farm out to haiku in parallel. very naive same prompt system via same api AWS bedrock, and can see that the a few of the 4b models are pretty good match, and could be easily run locally or just for cheap via a hosted provider. The "how much data and how much improvement" is a question i dont have a good intuition for anymore. I dont even have an order of magnitude guess on those two axis.

Heres raw numbers to spark discussion:

| Model | DocType% | Year% | Subject% | In $/MTok |

|---------------|----------|-------|----------|-----------|

| llama-70b -----| 83 | 98 | 96 | $0.72 |

| gpt-oss-20b --| 83 | 97 | 92 | $0.07 |

| ministral-14b -| 84 | 100 | 90 | $0.20 |

| gemma-4b ----| 75 | 93 | 91 | $0.04 |

| glm-flash-30b -| 83 | 93 | 90 | $0.07 |

| llama-1b ------| 47 | 90 | 58 | $0.10 |

percents are doc type (categorical), year, and subject name match against haiku. just uses the first 4 pages.

in the old world where these were my own in house models, id be interested in seeing if i could uplift those nubmers with traingin, but i haven't done that with the new LLMs in a while. keen to get even a finger to the air if possible.

Can easily generate tens of thousands of examples.

Might try myself, but always keen for an opinion.

_edit for table formatting_


You can fine tune a small LLM with a few thousand examples in just a few hours for a few dollars. It can be a bit tricky to host, but if you share a rough idea of the volume and whether this needs to be real-time or batched, I could list some of the tradeoffs you'd think about.

Source: Consulted for a few companies to help them finetune a bunch of LLMs. Typical categorical / data extraction use cases would have ~10x fewer errors at 100x lower inference cost than using the OpenAI models at the time.


ok, even that "few thousand examples" heuristic is useful. the usecase would be to run this task over id say somewhere in the order of magnitude of 100k extractions in a run, batched not real time, and we'd be interested in (and already do) reruns regularly with minor tweaks to the extracted blob (1-10 simple fields, nothing complex).

My interest in fine tuning at all is based on an adjacent interest in self hosting small models, although i tested this on aws bedrock for ease of comparison, so my hope is that given we are self hosting, then fine tuning and hosting our tuned model shouldn't be terribly difficult, at least compared to managed finetuning solutions on cloud providers which im generally wary of. Happy for those assumptions to be challenged.


Labeling or categorization tasks like this are the bread and butter of small fine tuned models. Especially if you need outputs in a specific json format or whatever.

I did an experiment where I did very simple SFT on Mistral 7b and it was extremely good at converting receipt images into structured json outputs and I only used 1,000 examples. The difficulty is trying to get a diverse enough set of examples, evaling, etc.

If you have great data with simple input output pairs, you should really give it a shot.


if you add 2 spaces at the start of the line, you turn it into a code block

  like this


  | Model | DocType% | Year% | Subject% | In $/MTok |

  |----------------|----|-----|----|-------|

  | llama-70b -----| 83 |  98 | 96 | $0.72 |

  | gpt-oss-20b ---| 83 |  97 | 92 | $0.07 |

  | ministral-14b -| 84 | 100 | 90 | $0.20 |

  | gemma-4b ------| 75 |  93 | 91 | $0.04 |

  | glm-flash-30b -| 83 |  93 | 90 | $0.07 |

  | llama-1b ------| 47 |  90 | 58 | $0.10 |


thank you so much! i suffered with this, and now i never will again!


Hi! I think this is a pretty good example:

https://www.atredis.com/blog/2024/6/3/how-to-train-your-larg...


I am thinking to fine-tune it to recognize better my handwriting. It already works quite well by default, but my writing is just horrible, so it got trouble sometimes.


This whole dataset needs to be downloadable, instead of being behind their UI..


> When the system rewards cheating, the rational choice is to cheat—or be disadvantaged.

Doesn't the current president of the U.S. and indeed his posse sorta of espouse this when you look at their backgrounds? This feels like a bigger cultural issue around what the advantaged folks have been doing all along


This has been endemic for a long time. I’ve always known folk who game the system, regardless of politics or demographics

The change I feel is that nobody even cares to be honorable any longer. There is no benefit, even culturally. As the article says, you’d have to be stupid not to do it. I’ve always tried to be honest idk

But laws don’t matter anymore. There is no shaming bad actors. It’s all blatantly out there and no consequences have been doled out so here we are.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You