For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | fahd777's favoritesregister

First coding test: Just going copy and paste out of chat. It aced my first coding test in 5 seconds... this is amazing. It's really good at coding.

Trying to use it for agentic coding...

lots of fail. This harmony formatting? Anyone have a working agentic tool?

openhands and void ide are failing due to the new tags.

Aider worked, but the file it was supposed to edit was untouched and it created

Create new file? (Y)es/(N)o [Yes]:

Applied edit to <|end|><|start|>assistant<|channel|>final<|message|>main.py

so the file name is '<|end|><|start|>assistant<|channel|>final<|message|>main.py' lol. quick rename and it was fantastic.

I think qwen code is the best choice so far but unreliable. So far these new tags are coming through but it's working properly; sometimes.

1 of my tests so far has been able to get 20b not to succeed the first iteration; but a small followup and it was able to completely fix it right away.

Very impressive model for 20B.


From the description it seems even the larger 120b model can run decently on a 64GB+ (Arm) Macbook? Anyone tried already?

> Best with ≥60GB VRAM or unified memory

https://cookbook.openai.com/articles/gpt-oss/run-locally-oll...


Their model card [0] has some information. It is quite a standard architecture though; it's always been that their alpha is in their internal training stack.

[0] https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...


Wow I really didn’t think this would happen any time soon, they seem to have more to lose than to gain.

If you’re a company building AI into your product right now I think you would be irresponsible to not investigate how much you can do on open weights models. The big AI labs are going to pull the ladder up eventually, building your business on the APIs long term is foolish. These open models will always be there for you to run though (if you can get GPUs anyway).


Wow, today is a crazy AI release day:

- OAI open source

- Opus 4.1

- Genie 3

- ElevenLabs Music


But was it reasoning or did it solve this because it was parting it‘s training data?

I feel vindicated for when I said that the moment Apple's line stops growing, they'll resort to monetizing their users like the rest of big-tech to increase their shareholder returns, and everyone here was like "Nooo, my sweet innocent publicly traded trillion dollar corporation would never betray me like that". Give it a few more years love, now they're boiling the frog.

yeah, we're a little past that kind of prompting now. Opus 4 will do a whole standup comedy routine about how fucking clueless most "prompt engineers" are if you give it permsission (I keep telling people, irreverence and competence cannot be separated in hackers). "You are a 100x Google SWE Who NEVER MAKES MISTAKES" is one I've seen it use as a caricature.

Getting good outcomes from the new ones is about establishing your credentials so they go flat out:

Edit: I'll post a better example when my flight lands. Go away now.


Gemini CLI pricing:

1. Gemini Code Assist (GCA) for Individuals: FREE for 1,000 model requests/day

2. GCA Standard: $22.80/month for 1,500 model requests/day (1.5x more than FREE)

3. GCA Enterprise: $54.00/month for 2,000 model requests/day (2X more than FREE)

Source: https://codeassist.google


Mozilla and Google provide an alternative called gemmafile which gives you an airgapped version of Gemini (which Google calls Gemma) that runs locally in a single file without any dependencies. https://huggingface.co/jartine/gemma-2-27b-it-llamafile It's been deployed into production by 32% of organizations: https://www.wiz.io/reports/the-state-of-ai-in-the-cloud-2025

In my own experience, 2.5 Pro 03-26 was by far the best LLM model at the time.

The newer models are quantized and distilled (I confirmed this with someone who works on the team), and are a significantly worse experience. I prefer OpenAI O3 and o4-mini models to Gemini 2.5 Pro for general knowledge tasks, and Sonnet 4 for coding.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You