First coding test:
Just going copy and paste out of chat. It aced my first coding test in 5 seconds... this is amazing. It's really good at coding.
Trying to use it for agentic coding...
lots of fail. This harmony formatting? Anyone have a working agentic tool?
openhands and void ide are failing due to the new tags.
Aider worked, but the file it was supposed to edit was untouched and it created
Create new file? (Y)es/(N)o [Yes]:
Applied edit to <|end|><|start|>assistant<|channel|>final<|message|>main.py
so the file name is '<|end|><|start|>assistant<|channel|>final<|message|>main.py' lol. quick rename and it was fantastic.
I think qwen code is the best choice so far but unreliable. So far these new tags are coming through but it's working properly; sometimes.
1 of my tests so far has been able to get 20b not to succeed the first iteration; but a small followup and it was able to completely fix it right away.
Their model card [0] has some information. It is quite a standard architecture though; it's always been that their alpha is in their internal training stack.
Wow I really didn’t think this would happen any time soon, they seem to have more to lose than to gain.
If you’re a company building AI into your product right now I think you would be irresponsible to not investigate how much you can do on open weights models. The big AI labs are going to pull the ladder up eventually, building your business on the APIs long term is foolish. These open models will always be there for you to run though (if you can get GPUs anyway).
I feel vindicated for when I said that the moment Apple's line stops growing, they'll resort to monetizing their users like the rest of big-tech to increase their shareholder returns, and everyone here was like "Nooo, my sweet innocent publicly traded trillion dollar corporation would never betray me like that". Give it a few more years love, now they're boiling the frog.
yeah, we're a little past that kind of prompting now. Opus 4 will do a whole standup comedy routine about how fucking clueless most "prompt engineers" are if you give it permsission (I keep telling people, irreverence and competence cannot be separated in hackers). "You are a 100x Google SWE Who NEVER MAKES MISTAKES" is one I've seen it use as a caricature.
Getting good outcomes from the new ones is about establishing your credentials so they go flat out:
Edit: I'll post a better example when my flight lands. Go away now.
In my own experience, 2.5 Pro 03-26 was by far the best LLM model at the time.
The newer models are quantized and distilled (I confirmed this with someone who works on the team), and are a significantly worse experience. I prefer OpenAI O3 and o4-mini models to Gemini 2.5 Pro for general knowledge tasks, and Sonnet 4 for coding.
Trying to use it for agentic coding...
lots of fail. This harmony formatting? Anyone have a working agentic tool?
openhands and void ide are failing due to the new tags.
Aider worked, but the file it was supposed to edit was untouched and it created
Create new file? (Y)es/(N)o [Yes]:
Applied edit to <|end|><|start|>assistant<|channel|>final<|message|>main.py
so the file name is '<|end|><|start|>assistant<|channel|>final<|message|>main.py' lol. quick rename and it was fantastic.
I think qwen code is the best choice so far but unreliable. So far these new tags are coming through but it's working properly; sometimes.
1 of my tests so far has been able to get 20b not to succeed the first iteration; but a small followup and it was able to completely fix it right away.
Very impressive model for 20B.