VMs bring greater isolation but they're a lot heavier and slower. The agents just use github for synchronization here, though I've been considering building some kind of todo list overlay locally.
Yes... but with full VMs, you can integrate docker (compose) into the application workflows without risking conflicts between separate agents on the same system/vm.
That's the part of your post I have trouble understanding. That you need to work around colliding ports suggests that the containers spun up by the agent run directly on the host, not inside some form of nested containerization. But if you do that, how do you ensure that the application running in those containers is sandboxed just as strictly as the agent itself?
The docker compose stack for the applications is spun up on the host. The agents have access to the docker socket which means they can talk to docker from inside their sandbox and spin up new sibling containers on the host. Yolobox isn’t designed for full isolation- just accidental commands you wouldn’t want to run on the host, and a convenient way of giving agents a customizable environment they control.
Early on in development I tried to harden the container to prevent deliberate escapes by the agent. This was a waste of time as the agents just kept finding more and more exploits when I asked them to try and break out.
I wouldn't assume that a VM will give you complete security against a determined AI. yolobox started as a way to prevent accidental `rm -rf ~` and has expanded into a set of tools that make working with CLI agents easier.
Personally, I run yolobox directly on the host. Being able to tell the agent it has sudo and can install and do whatever it needs to accomplish any task is handy.
Docker was only exposed later, after I realized that any sufficiently determined AI could break out of the container, and attempts to contain it were a waste of time. Also note that the docker socket is not exposed by default. There's a --docker flag for this.
I made some comments about exploits in the original post [1]. Gemini was quite creative in adding git hooks to the repo that would execute on the host machine. That folder is shared.
I'd try a modern file system with de-duplication/copy-on-write support. `cp` creates reflinks automatically if the file-system supports copy-on-write.
> Support for reflinks is indicated using the remap_file_range operation, which is currently (6.18) supported by bcachefs, Btrfs, CIFS, NFS 4.2, OCFS2, overlayfs, and XFS. Some external file systems support them too, including bcachefs and OpenZFS.
Author here. Three months ago I posted a Show HN for yolobox [1] - a sandbox for running AI coding agents without them being able to nuke your home directory.
Since then I've been using it almost every day, which eventually meant wanting more than one agent running against the same project at the same time. This post is what I learned trying to make that work without it being a constant disaster.
The short version: git worktrees are the right Git abstraction and the wrong abstraction for this problem. The unit you want to fork is the developer, not the branch - full folder copy, its own Compose project, its own URL. yolobox now ships a fork subcommand that does this.
Agreed. Flock has been a key contributor in solving numerous crimes. I'm happy for Flock to be in my county and would like the police to have more access to technology like this, not less.
Does your country also have a recurring problem of police shooting unarmed citizens? If not, it probably helps to understand the dynamics of why the police are not widely trusted here
County was not a typo. It's awful whenever there's an overuse of force in the USA. I'd recommend watching a few police bodycam videos on youtube before judging them wholesale though. The experience of a police officer in the United States seems to be long periods of tedium punctuated by moments of sheer terror and adrenaline. Anyone out there can have a gun and encounters can unexpectedly escalate to deadly violence in seconds. Some of them should not be police officers. There are many great officers out there just trying to protect their communities.
All it takes is one cop acting badly to ruin things for quite a lot of people though, and the fact that police uniformly circle ranks around any of their members who is accused of something regardless of the validity makes "well, not all of them are bad!" a pretty useless sentiment. I'll consider them individually when they start holding individuals accountable, but not before then.
I think you're making some hasty generalizations here. They don't "uniformly" cover for their colleagues. Do you expect the police service to be perfect and never make mistakes? Can you point me towards a single human-run service where that's the case?
Who cares that the code is garbage? As the models get bigger and more powerful it will be trivial to fully refactor the whole codebase. It’s coming sooner than you think.
- immune reset (sledgehammer that can “cure” diseases like MS but with many side effects and potential complications)
- immune suppression (super effective but with increased risk of infections and blunts vaccines)
- immune redirection (less effective but doesn’t mess up your immune system so badly).
It’s only in the last ~10 years that super effective treatments that can stop ~99% of lesion progression have existed- Ocrevus and Kesimpta. These are anti CD20 disease modifying therapies that destroy all your B cells. The memoir of Dr. Stephen Hauser- “The Face Laughs While The Brain Cries”- provides a fascinating insight into the development of these treatments over the last ~40 years of his career.
There are active trials of newer types of treatment and a lot of progress is being made in the MS space. It used to be a “death sentence” disease but is quite manageable for many sufferers now. It’s different for every individual and I wish the blog author good health.
> This is the same dynamic that kept IBM dominant for decades
IBM still sells mainframes but is no longer a growth darling.
> Markets are right to reassess multiples. But reassessing multiples is very different from pricing in extinction
What you are missing is that the SaaS companies were extremely overpriced. For instance, crm after all the carnage is still priced at 25 times earnings which is historically high for anything that is not a growth company. The perception was that these companies would print money year after year selling software trinkets on their platforms and as such were placed in the growth category. Now, it is plainly obvious that these software trinkets can be produced easily by anyone using AI. Their pricing-power has dramatically declined. Hence the re-rating. None of this contradicts the thesis in your ai-assisted article that these businesses have moats just like IBM and its mainframes. These businesses are now in a vicious reflexive narrative loop where the narrative will impact the real-world which will further fuel the narrative.
AI refusals are fascinating to me. Claude refused to build me a news scraper that would post political hot takes to twitter. But it would happily build a political news scraper. And it would happily build a twitter poster.
Side note: I wanted to build this so anyone could choose to protect themselves against being accused of having failed to take a stand on the “important issues” of the day. Just choose your political leaning and the AI would consult the correct echo chambers to repeat from.
The thought that someone would feel comforted by having automated software summarise the output of what is likely the output of automated software and publishing it under their name to impress other humans is so alien to me.
The whole idea was a bit of a joke and a reflection on how ridiculous it is that people get in trouble for failing to regurgitate the correct takes when certain events occur. It’s like insurance against getting canceled.
> Claude refused to build me a news scraper that would post political hot takes to twitter
> Just choose your political leaning and the AI would consult the correct echo chambers to repeat from.
You're effectively asking it to build a social media political manipulation bot, behaviorally identical to the bots that propagandists would create. Shows that those guardrails can be ineffective and trivial to bypass.
> Good illustration that those guardrails are ineffective and trivial to bypass.
Is that genuinely surprising to anyone? The same applies to humans, really—if they don't see the full picture, and their individual contribution seems harmless, they will mostly do as told. Asking critical questions is a rare trait.
I would argue its completely futile to even work on guardrails, if defeating them is just a matter of reframing the task in an infinite number of ways.
Thank you! I wanted to mention toasted coconut flake snacks as well, but the sentence was long enough already. If your company has those in the kitchenette, you're definitely well-capitalized.
And yeah, high agency is really trendy at this moment in the startup sphere, but hunger is not talked about enough IMO. Maybe because it's too obvious to be even worth mentioning.