It feels downstream of CMU's "reasonable person principle". They know that people are going to use AI on their homework, but they trust that they want to learn and improve their skills -- and this is good advice for doing so.
I'm somewhat biased because I was involved in a previous, related course. The important takeaways aren't really about gritty debugging of (possibly) large homework assignments, but the high-level overview you get in the process. AI assistance means you could cover more content and build larger, more realistic systems.
An issue in the first iteration of Deep Learning Systems was that every homework built on the previous one, and errors could accumulate in subtle ways that we didn't anticipate. I spent a lot of time bisecting code to find these errors in office hours. It would have been just as educational to diagnose those errors with an LLM. Then students could spend more time implementing cool stuff in CUDA instead of hunting down a subtle bug in their 2d conv backwards pass under time pressure... But I think the breadth and depth of the course was phenomenal, and if courses can go further with AI assistance then it's great.
This new class looks really cool, and Zico is a great teacher.
I'm old but cumulative assignments are nothing new (the build an OS class, build a compiler class, etc) and my recollection is after you submitted an assignment the instructor would release a correct version you could swap in for yours. So any bugs in previous modules (that the TA/grader didn't catch) couldn't hold up the current assignment.
Old too, and in my experience that was often slightly more work than fixing the bugs in my own implementation. I did swap out a borked module in the build an OS class once but otherwise used my own.
A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model.
B) I don't think the story is so clean. The distilled models often have regressions in important areas like safety and security (see, for example, NIST's evaluation of DeepSeek models). This might be why we don't see larger companies releasing their own tiny reasoning models so much. And copying isn't exactly healthy competition. Of course, I do find it useful as a researcher to experiment with small reasoning models -- but I do worry that the findings don't generalize well beyond that setting.
C) Maybe because we want lots of different perspectives on building models, lots of independent innovation. I think it's bad if every model is downstream of a couple "frontier" models. It's an issue of monoculture, like in cybersecurity more generally.
D) Is it really 90% of the performance, or are they just extremely targeted to benchmarks? I'd be cautious about running said local models for, e.g., my agent with access to the open web.
> A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model.
Investment/gpu hours is locked behind export controls, which Anthropic supports since it keeps GPU prices low(er) sans PRC demand. Given that why would PRC labs care about US IP laws, the high level story is pretty clean, there's no healthy competition when US policy (supported by US labs) has been stacked to keep PRC behind, and it's entirely reasonable within PRC purview to circumvent.
Fair points, and worth responding to for a more nuanced discussion! I hope you take these responses in that light :)
A) Well, sure, yes, it's different specific IP being distilled on versus what was trained on. But I don't see why the same principles should not apply to both. If companies ignore IP when training on material, then it should be okay for other companies to ignore IP when distilling on material — either IP is a thing we care about or it isn't. (I don't).
B) I'm really not sure how seriously I take the worries about safety and security RLing models. You can RLA amodel to refuse to hack something or make a bio weapon or whatever as much as you want, but ultimately, for one thing, the model won't be capable of helping a person who has no idea what they're doing. Do serious harm anyway. And for another thing, the internet already exists for finding information on that stuff. And finally, people are always going to build the jailbreak models anyway. I guess the only safety related concern I have with models is sychophancy, and from what I've seen, there's no clear trend where closed frontier models are less sychophantic than open source ones. In fact, quite the opposite, at least in the sense that the Kimi models are significantly less psychophantic than everyone else.
C) This is a pretty fair point. I definitely think that having more base frontier models in the world, trained separately based on independent innovations, would be a good thing. I'm definitely in favor of having more perspectives.
But it seems to me that there is not really much chance for diversity in perspectives when it comes to training a base frontier model anyway because they're all already using the maximum amount of information available. So that set is going to be basically identical.
And as for distilling the RL behaviors and so on of the models, this distillation process is still just a part of what the Chinese labs do — they've also all got their own extensive pre-training and RL systems, and especially RL with different focuses and model personalities, and so on.
They've also got diverse architectures and I suspect, in fact, very different architectures from what's going on under the hood from the big frontier labs, considering, for instance, we're seeing DSA and other hybrid attention systems make their way into the Chinese model mainstream and their stuff like high variation in size, and sparsity, and so on.
D) I find that for basically all the tasks that I perform, the open models, especially since K2T and now K2.5, are more than sufficient, and I'd say the kind of agentic coding, research, and writing review I do is both very broad and pretty representative. So I'd say that for 90% of tasks that you would use an AI for, the difference between the large frontier models and the best open weight models is indistinguishable just because they've saturated them, and so they're 90% equivalent even if they're not within 10% in terms of the capabilities on the very hardest tasks.
Yeah of course, I've been thinking about this a lot and I'm updating my beliefs all the time, so it's good to hear some more perspectives
A) I see what you mean. But I'm more so thinking: companies consider their models an asset because they took so much compute and internal R&D effort to train. Consequently, they'll take measures to protect that investment -- and then what do the downstream consequences look like for users and the AI ecosystem more broadly? That is, it's less about what's right and wrong by conventional wisdom, and more about what consequences are downstream of various incentives.
B) I don't really care about AI safety in the traditional sense either, i.e., can you get an LLM to tell you to do some thing that has been ordained to be dangerous. There's lots of attacks and it's basically an insoluble problem until you veer into outright censorship. But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me. I don't want my agent to read an HN post with a social-engineering-themed prompt injection attack and mail my passwords to someone. (If this sounds absurd, my Clawbot defaulted to storing passwords in a markdown file... which could possibly be on me, but was also the default behavior.)
C) This is a completely fair point, there's amazing work coming out of these smaller labs, and the incentives definitely work out for them to do a distillation step to ship faster and more cheaply. I think the small labs can iterate fast and make big changes in a way that the monolithic companies cannot, and it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying. Which is not to say they're doing none of that, GRPO for example is a fantastic idea.
One way you could have a change in perspective is not just in the architecture/data mix, but in the way you spend test-time compute. The current paradigm is chain-of-thought, and to my knowledge, this is what distillation attacks typically target. So at least, all models end up "reasoning" with the same sort of template, possibly just to interlock with the idea of distilling a frontier API.
D) Interesting to hear. In my research, I find these models to be quite a bit harder to work with, with significantly higher failure rates on simple instruction following. But my work also tends to be on the R&D side, so my usage patterns are likely in the long-tail of queries.
> it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying
It seems to me like they're already doing that. Some of the most fun I've had actually is reading their papers on the different R.L. environments, especially Egentic ones they set up and the various new algorithms they use to do RL and training in general. Combine that with how much they are innovating with attention mechanisms and I feel like distillation doesn't seem to be really replacing research into these means as just supplementing it — and maybe even making it possible in the first place, because otherwise it would be simply too expensive to get a reasonably intelligent model to experiment with!
> But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me.
Ah, I see what you mean. Can you point me to any benchmarks or research on how good various models are out of waiting social engineering and prompt injection attacks? That would be extremely interesting to me. Fundamentally, though, I don't think that's really a soluble problem either, and the right approach is to surround an agent with a sufficiently good harness to prevent that. Perhaps with an approach like this:
Somebody could use this as a starting point. http://touchscale.co/ You'd have to collect new data on touch strength vs. weight to get the regression parameters.
(If you do this, let me know and I can add it to the site above, and then we can both delight in the surprisingly large amount of unmonetizable traffic it gets.)
The single most irritating killed feature from Apple. Redesign half of their UI to rely on 3D Touch to make sense, then get rid of 3D Touch without redesigning the UI. Previewing links, moving the cursor, interacting with items, they’re all “press and hold until haptic feedback” instead of “quickly press hard and get immediate feedback.” Easier to accidentally trigger, slower to trigger on purpose.
Hardware cost+extra weight (need to make the glass thicker to be able to handle extra force and not push on the display). Turns out nobody was really using it because discoverability sucked..
Hardware cost & weight, fine. Glass doesn't need to be thicker than it currently is (I can press on my 13 Pro's screen about twice as hard as was needed for 3D Touch's max depth, and no issues with the screen), and the last time I replaced a battery on a 12, the screen was just as thick as the XS.
>Turns out nobody was really using it because discoverability sucked..
Sure, but then redesign the UI after removing 3D Touch to not be equally undiscoverable but less precise. Even on the latest iOS beta with its full redesign, there's still many, many actions that require a long press that are completely undiscoverable. (For example, if you don't have the Shazam app installed, go find the list of songs Siri has recognized when asked "What's this song?" Don't look up the answer.)
> Glass doesn't need to be thicker than it currently is (I can press on my 13 Pro's screen about twice as hard as was needed for 3D Touch's max depth, and no issues with the screen)
I dont think this is a great argument. The glass maybe needs to be thicker so the sensors on the border can properly measure the pressure, not because the screen is close to shattering.
He is capable of pressing twice as hard as the feature required at maximum. The screen handles 2x the maximum without issues. Therefore, the glass is thick enough to handle half that pressure,as required by the feature.
As far as I know, the pressure is measured around the edge of the screen. If the screen is thin enough, it could bend when pressed and the pressure applied to the center of the screen can’t be properly measured. I don’t think the problem with a too thin screen is the screen breaking when pressing it.
The discoverability sucked because Apple never rolled this out to all of the devices, themselves grossly under utilized the feature and eventually ghosted it.
It was by far the best cursor control paradigm on iOS. Now everything is long press which is slow and as error prone.
I’m all for proposing different paradigms as accessibility but 3dtouch was awesome.
3D Touch was amazing for typing alone, I miss it basically every day when I type more than a couple of words on my phone. It was so great to be able to firm-press and slide to move the insertion point, or firmer press to select a word or create a selection. It was like a stripped down mobile version of the kind of write-and-edit flow of jumping around between words that I can get on a proper keyboard with Emacs keybindings drilled into my brain.
You can still move the cursor by long pressing on the space bar, in case you didn't know. There's no equivalent replacement for the selection behavior you're describing, though (as far as I'm aware).
I don't like it when old people are the reason the rest of us can't have nice things. Some grandma in Nebraska can't use 3D touch and now the rest of the demographic of Apple's customers are deprived of it.
There was a principle of UI design that all UI actions should be discoverable, either with a visible button or a menu item in the menus at the top of the screen (or window on Windows). This is annoying for power users and frequently used actions, so those can also be made available with keyboard shortcuts or right-click actions or what have you, but they must always be optional. This allows power users to be power users without impacting usability for novices.
We've been losing this idea recently, especially in mobile UIs where there's a lot of functionality, not much space to put it in, and no equivalent of the menu bar.
When I had an iPhone XS i could never understand how to predictably do a normal touch or a 3d touch, or where exactly the OS has different actions for one vs the other.
And I play games [1] using just my macbook pro's trackpad...
[1] For example, Minecraft works perfectly without a mouse. So does Path of Exile. First person shooters ofc don't.
iPhone 6s and 6s Plus (2015) - First to introduce 3D Touch
iPhone 7 and 7 Plus (2016)
iPhone 8 and 8 Plus (2017)
iPhone X (2017)
iPhone XS and XS Max (2018) - Last models with 3D
Interesting that the iPhone SE 2nd/3rd generation with iPhone 8 form factor do not have 3D touch but "Haptic touch" instead.
The choice of activation function isn't entirely clear to me, but I think it's definitely possible to make a network that operates entirely in the frequency domain. It would probably be pretty easy to start experimenting with such a thing with the nice complex number and FFT support in PyTorch 1.8. :)
Like you said, there's already a significant connection between convolutional networks and the Fourier domain (the convolution theorem).
Tangentially, I've recently worked on a project that focused on implementing convolution in the Fourier domain, and how that allows one to control useful properties of convolutions (like "smoothness" and orthogonality).
I'm somewhat biased because I was involved in a previous, related course. The important takeaways aren't really about gritty debugging of (possibly) large homework assignments, but the high-level overview you get in the process. AI assistance means you could cover more content and build larger, more realistic systems.
An issue in the first iteration of Deep Learning Systems was that every homework built on the previous one, and errors could accumulate in subtle ways that we didn't anticipate. I spent a lot of time bisecting code to find these errors in office hours. It would have been just as educational to diagnose those errors with an LLM. Then students could spend more time implementing cool stuff in CUDA instead of hunting down a subtle bug in their 2d conv backwards pass under time pressure... But I think the breadth and depth of the course was phenomenal, and if courses can go further with AI assistance then it's great.
This new class looks really cool, and Zico is a great teacher.