Well, strictly anecdata but there was a time when I worked on code which for $reasons had no dedicated test environment, and externalities were not our problem according to management. So we were reaching out to the big bad world for testing, but we didn't own the network between it and us. I ended up writing tests for the network. Caught a lot of problems with the network. Didn't make any friends.
DISCLAIMER: I've been around IT for probably the majority of y'all's lifetimes, so I'm not saying this happens often. But just because something is fundamentally wrong, doesn't mean that all fundamentally wrong things are the same. In my experience they differ more from each other than the possible good ways of doing the same thing. Don't conflate things without a good reason.
Totally, devcontainers are fantastic! In this agent sandboxing space there's also Leash, which in addition to Docker/Orbstack/Podman provides a sophisticated macOS-native system extension mode - https://github.com/strongdm/leash
My experience with agents in larger / older codebases is that feedback loops are critical. They'll get it somewhere in the neighborhood of right on the first attempt; it's up to your prompt and tooling to guide them to improve it on correctness and quality. Basic checks: can the agent run the app, interact with it, and observe its state? If not, you probably won't get working code. Quality checks: by default, you'll get the same code quality as the code the agent reads while it's working; if your linters and prompts don't guide it towards your desired style, you won't get it.
To put that another way: one-shots attempts aren't where the win is in big codebases. Repeat iteration is, as long as your tooling steers it in the right direction.
Yes-ish. It's worth keeping up with the rising tide of model capabilities, but it's not worth stressing over eliciting every last drop. Many of the specific techniques that add value today will be wasted effort with smarter models in a month or two.
Isn't Frozen something you do to a set or dictionary to say, I'm not going to add any more values, please give me a version of this which is optimized for lookup only?