For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more HarHarVeryFunny's commentsregister

> For a fair comparison you need to look at the total cost, because 4.7 produces significantly fewer output tokens than 4.6

Does it? Anthropic's own announcement says that for the same "effort level" 4.7 does more thinking (i.e uses more output tokens) than 4.6, and they've also increased the default effort level from 4.6 high to 4.7 xhigh.

I'm not sure what dominates the cost for a typical mix of agentic coding tasks - input tokens or output ones, but if you are working on an existing project rather than a brand new one, then file input has to be a significant factor and preliminary testing says that the new tokenizer is typically generating 40% or so more tokens for the exact same input.

I really have to wonder how much of 4.7's increase in benchmark scores over 4.6 is because the model is actually better trained for these cases, or just because it is using more tokens - more compute and thinking steps - to generate the output. It has to be a mix of the two.


> In most of the world such photos would be deemed of public interest and shared by the media

Perhaps, but increasingly not here in the US, which used to consider itself the leader of the "Free World".

Trump thinks nothing of declaring journalists terrorists and threatening to take away the broadcast licenses of TV stations that are embarrassing him.

It'd be nice if we could say this is just Trump, a bad president gone gaga, but the Republican party supports him, so unfortunately this authoritarian control of the media seems to be becoming normalized.


Forget AI, if you could time travel and just bring an iPhone back to the late 70's it would look like a science fiction fantasy. An alien artifact.

It's interesting to wonder if the next 50 years of computing will be the same. Will a device from 2075 make what we have today seem like primitive toys? No doubt we'll have full blown AGI by then, which may be the major difference, and we'll (or rather our kids) will look back with nostaligia on these LLMs which seemed so revolutionary at the time, but severely limited and flawed, just a hint of what is going to come.


Having lived through this era, living in the UK, Byte always seemed more commercially orientated than hobbyist, but I would buy and read it all the same.

Hobbyist computing grew out of hobbyist electronics with the Altair 8800 kit featured on the cover of Popular Electronics in 1975 being one of the first personal computers. My own first computer was also a kit (bag of components and a bare circuit board), the NASCOM-1, introduced a couple of years later in 1977 and featured on the cover of the first issue of Personal Computer World, which was the first UK magazine dedicated to this new hobby of computing.

Another great magazine of this era was Dr Dobbs (Journal of Computer Calisthenics & Orthodontia. Running Light without Overbyte), which was also aimed at hobbyists, featuring lots of program listings. These American magazines like Byte & Dr Dobbs were easy to buy in high street newsagents in the UK.


> Having lived through this era, living in the UK, Byte always seemed more commercially orientated than hobbyist, but I would buy and read it all the same.

The hobbyist arena tended to fall more towards magazines like Popular Electronics, and the focus was much more the lower circuit levels. Byte dipped their toes into that arena via a few of the regular columnists, but that was not their target (despite the fact that many of us subscribers were "computer hobbyists" by most any definition of that term).


What does it mean that sub-agents use a 5 min cache? Is this just for growing contexts submitted by subagent itself? What about fixed sub-agent context prefixes such as tool definitions? Are those 1hr TTL?

It seems that a more flexible cache control mechanism, useful for variable duration sub-agents, would be more like an arena allocator. Let the client tag their API key activity with different "cache group" (arena) identifiers, then provide an API method to let them free each cache group when they are finished with it. Each sub-agent would then use it's own cache group and clear it when the sub-agent exits, rather than just having a fixed 5min or 1hr TTL. The client could provide a default TTL for each cache group to use in case they forget to free.

Context prefixes like tool definitions that will be the same for multiple invocations of the same sub-agent type could then be created (maybe by main agent) with a different cache group, and a longer default TTL.


They've acknowledged that as a bug and have fixed it.

It seems far from clear at this point what the dollar value of agentic coding tools is if measured objectively in terms of value delivered.

IF they can be shown to be multiplying developer productivity (completing more projects on time, without reduction in quality and associated costs) by some significant amount then they are providing value at current cost, but it's not at all clear whether that is in fact the case, especially since most of the claims of productivity are anecdotal and/or based on things like LOC generated rather than delivered functionality.

Meta's "token usage leaderboard" shows how far some companies are from measuring anything meaningful! It'd be exactly like some company in the .com era measuring employee's "productivity" by how many bytes they'd downloaded from the internet each day (even if that was just a cat video). "Woo hoo, we're out-internetting you! Our internet bill is enormous!" (then proceeds to fire the guy coding, and gives a bonus to the one downloading cat videos).

There have been some studies/polls done indicating that some very high percentage (90%?) of corporate AI projects are failing. Why is this? Are they ill-conceived, and or ill-executed? Is it the quality of what's being produced that is causing these projects to be abandoned and/or considered as a failure?

There have also been some separate studies indicating programmer productivity to be reduced, not increased, by use of AI coding tools, which is easy to understand. The developer struggles with the tool and it's fallibilities, eventually gets it to generate something that works, then closes his JIRA story with an "AI coded" tag (which shows up on the boss's dashboard, and is all that he sees). Was this an AI productivity success story? To the boss perhaps, but not if the developer admits that it would have just been faster to do it the old way by hand or cut-n-paste from stack overflow.


Part of what makes asbestos (and also fiberglass) dangerous, isn't just the sharpness but also the long shape which means that macrophages can't engulf them.

Moon dust is still problematic since although smaller it also can't be digested by macrophages and it's believed it would accumulate in the lungs, building up on repeated exposure.


Sounds to me like the threat would be silicosis.

> The fundamental problem with these frontier model companies is that they're incentivized to create models that burn through more tokens

That's one market segment - the high priced one, but not necessarily the most profitable one. Ferrari's 2025 income was $2B while Toyota's was $30B.

Maybe a more apt comparison is Sun Microsystems vs the PC Clone market. Sun could get away with high prices until the PC Clones became so fast (coupled with the rise of Linux) that they ate Sun's market and Sun went out of business.

There may be a market for niche expensive LLMs specialized for certain markets, but I'll be amazed if the mass coding market doesn't become a commodity one with the winners being the low cost providers, either in terms of API/subscriptions costs, or licensing models for companies to run on their own (on-prem or cloud) servers.


Betting on continued exponential growth is basically a game of chicken. Growth has to slow down and level off at some point as adoption and usage saturates.

It's a bit like playing roulette by always betting on black and doubling your bet every time you lose. When you eventually, inevitably, do lose, your loss is going to be huge because you've been doubling your bet at each stage.

With LLM model generations and investment, it goes something like this. Let's say profits have been doubling year over year for each new model/investment cycle, and you want to bet on this doubling continuing forever.

Year 1 you get $10B in profit, and spend $20B on extra capacity for next year

Year 2 you get $20B in profit, and spend $40B on extra capacity

Year 3 you get $30B in profit, and spend $??? on extra capacity

You're already in trouble. Profit growth from Year 2 to 3 was "only" 50% vs the doubling you were gambling on, so you've now lost $10B ($40B spent only earnt you $30B of profit), and what are you going to do? Double down like the roulette player?

The longer the pattern of profit doubling goes, before it slows down, the worse it will end for you, since your bets are doubling each year. Saying "woo hoo, look at me! risk pays!" is a bit like saying the same while playing russian (not casino) roulette for money.

I worked for Acorn Computers UK in the early 80's and saw something similar firsthand. The brand new personal computer market was exploding, a once in a lifetime phenomenon, that no-one knew how to forecast. To make matters worse the market was highly seasonal with most sales at xmas, so the company had to guess what continued year-on-year exponential growth might look like (brand new market - no-one had a clue), and plan/spend ahead and stock warehouses full of computers ready for xmas. Sadly Acorn took the Sam Altman highly optimistic/irresponsible approach, got the forecast wrong, and was left with a huge warehouse full of rapidly depreciating computers. The company never fully recovered, although ARM rose out of the ashes.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You