For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | musebox35's commentsregister

I think this question is one of the more concrete and practical ways to attack the problem of understanding transformers. Empirically the current architecture is the best to converge training by gradient descent dynamics. Potentially, a different form might be possible and even beneficial once the core learning task is completed. Also the requirements of iterated and continuous learning might lead to a completely different approach.

Thanks for posting a through and accurate summary of the historical picture. I think it is important to know the past trajectory to extrapolate to the future correctly.

For a bit more context: Before 2012 most approaches were based on hand crafted features + SVMs that achieved state of the art performance on academic competitions such as Pascal VOC and neural nets were not competitive on the surface. Around 2010 Fei Fei Li of Stanford University collected a comparatively large dataset and launched the ImageNet competition. AlexNet cut the error rate by half in 2012 leading to major labs to switch to deeper neural nets. The success seems to be a combination of large enough dataset + GPUs to make training time reasonable. The architecture is a scaled version of ConvNets of Yan Lecun tying to the bitter lesson that scaling is more important than complexity.


They say that they did test but the coverage was not enough to pick it up, at least for the prompt change:

“ After multiple weeks of internal testing and no regressions in the set of evaluations we ran, we felt confident about the change and shipped it alongside Opus 4.7 on April 16.

As part of this investigation, we ran more ablations (removing lines from the system prompt to understand the impact of each line) using a broader set of evaluations. One of these evaluations showed a 3% drop for both Opus 4.6 and 4.7. We immediately reverted the prompt as part of the April 20 release.”

Considering the number and scope of users they serve, I can sympathize with the difficulty. However, they should reimburse affected users at least partially instead of just announcing “our bad, sorry “. That would reduce the frustration.


Naively, one could assume that with AI it should be possible to create a long and broad list of test cases…

I attended the related session at Next’26 yesterday. From my understanding it is a new backend and they will release the torch tpu source on github in one or two months. It will not support all ops initially but they are moving fast. Still for a while torchax is mature enough to run torch models on tpus by translating to jax.

My ancient boxed copy of Visual Basic for DOS 1.0 that supported mouse clicks on TUI buttons would have found your viewpoint quite offensive if it had any AI in it ;-) Oh boy, good old days.


Similar trend in open text-to-image models: Flux.1 was 12B but now we have 6B models with much better quality. Qwen Image goes from 20B to 7B while merging the edit line and improving quality. Now that the cost of spot H200s at 140GB came down to A100 levels, you can finally try larger scale finetuning/distillation/rl with these models. Very promising direction for open tools and models if the trend continues.


I guess, the sense of accomplishment is very person dependent. I enjoy programming a lot, but it is easy to find people who would challenge themselves to scale the said website to a million users/X view per day. I don't know the why, probably there is no fixed meaning to existence and nature likes diversity.

For me, the fun in programming also depends a lot on the task. Recently, I wanted to have Python configuration classes that can serialize to yaml, but I also wanted to automatically create an ArgumentParser that fills some of the fields. `hydra` from meta does that but I wanted something simpler. I asked an agent for a design but I did not like the convoluted parsing logic it created. I finally designed something by hand by abusing the metadata fields of the dataclass.field calls. It was deeply satisfying to get it to work the way I wanted.

But after that, do I really want to create every config class and fill every field by myself for the several scripts/classes that I planned to use? Once the initial template was there, I was happy to just guide the agent to fill in the boilerplate.

I agree that we should keep the fun in programming/art, but how we do that depends on the what, the who, and the when.


That is likely. Another factor that came into my mind is the gpu using less power due to simpler computations. You can store less data for grayscale, so you need to go over less pixel data to do effects etc. Whether accessibility controls achieve this or not would be implementation dependent I guess.


Even with the best GPU optimizations, most of the data will be processed in full color and then tossed through an extra pass at the end. More likely is that all the data does that.


The bitter lesson here is that if you want to control a business you can not avoid or outsource marketing. It is a huge part of any trade and you have to bear the marketing cost. I totally understand the desire to avoid it and concentrate on the craft and to create. I tried and failed at it numerous times. I decided that I will not start a business if I do not have any partners who understand and are willing to engage in sales and marketing.


I think this is what is blunted by mass education and most textbooks. We need to discover it again if we want to enjoy our profession with all the signals flowing from social media about all the great things other people are achieving. Staying stupid and hungry really helps.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You