"Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work."
I'd go further: what's valuable is code review. So review the AI agent's code yourself first, ensuring not only that it's proven to work, but also that it's good quality (across various dimensions but most importantly in maintainability in future). If you're already overwhelmed by that thousand-line patch, try to create a hundred-line patch that accomplishes the same task.
I expect code review tools to also rapidly change, as lines of code written per person dramatically increase. Any good new tools already?
End of an era: video (with broadband Internet penetration) was the best tool we had for 15+ years. But LLMs are now good enough, including in image+infographic generation and factuality (especially when grounding resources are provided... which is where human experts still matter). I think video is now better only for learning physical hands-on skills... and those videos tend to be on YouTube rather than on Udemy or Coursera.
Coursera's model will still survive for a while, given people's desire for branded credentials (university degree credits or company-branded certificates)... until the university bubble bursts too in a 10+ years. Start of trend: https://www.nbcnews.com/politics/politics-news/poll-dramatic...
A bit of a plug: we tried building a consumer business, with a learning experience built atop these LLMs: https://uphop.ai/learn . Still offered for free to consumers, but we're now succeeding much better on B2B ("you either die a consumer business or live long enough to become B2B" was v true for us).
LLMs are not remotely good enough to use as a learning tool. They still make shit up a ton of the time, and you can only catch it if you already know the material (so, not useful for learning). They probably never will be useful for learning, since even after all this time hallucinations are still just as bad as they ever were.
Have you tried them with providing a grounding resource, e.g. attaching a file to ChatGPT or NotebookLM? Yes need some human expert to create (or curate) that grounding resource in the first place, but LLMs handle the rest well: presenting info in different ways and paces, interacting with the learner like a tutor, etc.
Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.
Yes, that's very good because it's my main use case for Flash; queries depending on world knowledge. Not science or engineering problems, but think you'd ask someone that has a really broad knowledge about things and can give quick and straightforward answers.
Big knowledge cutoff jump from Sep 2024 to Aug 2025. How'd they pull that off for a small point release, which presumably hasn't done a fresh pre-training over the web?
Did they figure out how to do more incremental knowledge updates somehow? If yes that'd be a huge change to these releases going forward. I'd appreciate the freshness that comes with that (without having to rely on web search as a RAG tool, which isn't as deeply intelligent, as is game-able by SEO).
With Gemini 3, my only disappointment was 0 change in knowledge cutoff relative to 2.5's (Jan 2025).
It's all about the chip economics. I don't know how the _manufacturing cost_ of Google's TPUs compares to Nvidia's GPUs, for inference of equivalent token throughput.
But at the moment Nvidia's 75-80% gross margin is slowly killing its customers like OpenAI. Eventually Nvidia will drop its margins, because non-0 profit from OpenAI is better than the 0 it'll be if OpenAI doesn't survive. Will be interesting to see if, say, 1/3 the chip cost would make OpenAI gross margin profitable... numbers bandied in this thread of $20B revenue with $115B cost imply they need 1/6 the chip cost, but I doubt those numbers are right (hard to get accurate $ numbers for a private company for the benefit of us arm-chair commenters).
Yes, from the first principles perspective this AI thingy is just about running electricity through some wires printed on silicon by a Taiwanese company using a Dutch machine. Which means, up until the Taiwanese you have plenty of room to cut margins up until that point the costs are mostly greed based. That is Nvidia is asking for the highest price the customer can pay and they have quite a way to the cost that define their min price. Which means AI companies can actually keep getting better deals until the devices delivered to them are priced close to TSMCs bulk wafer printing prices.
"complementing the Neural Accelerators in the CPU and GPU" seems to be a misprint; I don't believe they have the accelerators in the CPU too.
Still super interesting architecture with accelerators in each GPU core _and_ a dedicated neural engine. Any links to software documentation for how to leverage both together, or when to leverage one vs the other?
Plug for our https://uphop.ai/app : it's for adult learning / corporate training. We break down a desired job skill into small chunks, and engage the user with practice & give nuanced feedback. And of course like chatbots make it easy for user to ask more questions or go on tangents.
Would appreciate feedback!
There's a bit of overlap with Learn Your Way I guess. I'm not sure users need to toggle between alternate formats of the same instruction though. Instead the instruction itself should be as multi-modal as possible, and offer flexibility to ask questions... which even gemini.google.com offers so I'm not sure this is a net improvement over that.
Neural band is huge, glad they're shipping it already rather than waiting (years?) for a production version of Orion (the full AR glasses they demo'd a year ago together with this neural band). TheVerge found the controls great, even tried an alpha of handwriting for text input: https://youtu.be/5cVGKvl7Oek
These glasses are just "annotated reality" rather than full AR, with just 1 small display; think Google Glass but 100x more discreet. So discreet input and output on a device with a camera.
I think the backlash against Google Glass was counterproductive - the product was intentionally made to be obvious that someone was wearing it. But because of the backlash, companies that want to do this kind of tech now have to hide it, such as this.
Let's forget that Google, just like Facebook, is an evil corporation with its main line of product being the sale of personal information, with absolutely no regards, except for when some country manages to slap their wrist.
So, it's quite a stretch to say "counterproductive", I'm for one very glad that happened. Sure, I love the tech and what really mind-blowing we could do with it (I was part of devs working with Gglasses) but I don't want these ruthless corps being the ones owning the output.
I'll wait until it is open, with self-hosted infra, and until then, I'll politely ask to remove the glasses if someone is talking to me.
The other day I was doing a task I wished to record my view from in case I needed to review later and searched for webcam glasses or something. There are tons of $50 fairly unobtrusive spy glasses on Amazon already.
Actually, this reminds me I might buy a pair after all. In my case I was reviewing an important document package and wanted a way to quickly confirm later what I put in. But I've also recorded unboxings (when worried about a potential return) and disassembly of appliances for repairs that I only make every year or two.
Neural band is huge, glad they're shipping it already rather than waiting (years?) for a production version of Orion (the full AR glasses they demo'd a year ago together with this neural band). TheVerge found the controls great, even tried an alpha of handwriting for text input: https://youtu.be/5cVGKvl7Oek
These glasses are just "annotated reality" rather than full Augmented Reality, with just 1 small display; think Google Glass but 100x more discreet.
I've thought larger bi-folds have an odd aspect ratio for anything but two-app multi-tasking. E.g. there's ~ no benefit for videos compared to non-foldables with half the screen area. Is aspect ratio not an issue in practice somehow?
Yeah videos aren't as big as they could be due to the aspect ratio, they still look bigger though and its enough of a life improvement that I always flip mine open when I want to watch a video!
I'd go further: what's valuable is code review. So review the AI agent's code yourself first, ensuring not only that it's proven to work, but also that it's good quality (across various dimensions but most importantly in maintainability in future). If you're already overwhelmed by that thousand-line patch, try to create a hundred-line patch that accomplishes the same task.
I expect code review tools to also rapidly change, as lines of code written per person dramatically increase. Any good new tools already?