More

Finbarr · 2026-05-05T20:36:34 1778013394

VMs bring greater isolation but they're a lot heavier and slower. The agents just use github for synchronization here, though I've been considering building some kind of todo list overlay locally.

tracker1 · 2026-05-05T20:56:04 1778014564

Yes... but with full VMs, you can integrate docker (compose) into the application workflows without risking conflicts between separate agents on the same system/vm.

Finbarr · 2026-05-05T20:57:11 1778014631

Did you read the post? That's exactly the problem I just solved.

CodesInChaos · 2026-05-06T13:41:44 1778074904

That's the part of your post I have trouble understanding. That you need to work around colliding ports suggests that the containers spun up by the agent run directly on the host, not inside some form of nested containerization. But if you do that, how do you ensure that the application running in those containers is sandboxed just as strictly as the agent itself?

Finbarr · 2026-05-06T14:15:18 1778076918

The docker compose stack for the applications is spun up on the host. The agents have access to the docker socket which means they can talk to docker from inside their sandbox and spin up new sibling containers on the host. Yolobox isn’t designed for full isolation- just accidental commands you wouldn’t want to run on the host, and a convenient way of giving agents a customizable environment they control.

Early on in development I tried to harden the container to prevent deliberate escapes by the agent. This was a waste of time as the agents just kept finding more and more exploits when I asked them to try and break out.

CodesInChaos · 2026-05-06T14:56:06 1778079366

So the right way to use yolobox is to spin up one VM as a secure sandbox, and then use yolobox to separate individual agents within the VM?

Finbarr · 2026-05-06T15:30:35 1778081435

I wouldn't assume that a VM will give you complete security against a determined AI. yolobox started as a way to prevent accidental `rm -rf ~` and has expanded into a set of tools that make working with CLI agents easier.

Personally, I run yolobox directly on the host. Being able to tell the agent it has sudo and can install and do whatever it needs to accomplish any task is handy.

CodesInChaos · 2026-05-06T15:12:03 1778080323

Sounds interesting. What kind of exploits did they find, apart from docker being exposed?

Finbarr · 2026-05-06T15:28:33 1778081313

Docker was only exposed later, after I realized that any sufficiently determined AI could break out of the container, and attempts to contain it were a waste of time. Also note that the docker socket is not exposed by default. There's a --docker flag for this.

I made some comments about exploits in the original post [1]. Gemini was quite creative in adding git hooks to the repo that would execute on the host machine. That folder is shared.

Finbarr · 2026-05-05T17:07:30 1778000850

Hard drives are cheap and I haven't approached the limit yet. So I left this as a future optimization.

CodesInChaos · 2026-05-06T13:47:39 1778075259

I'd try a modern file system with de-duplication/copy-on-write support. `cp` creates reflinks automatically if the file-system supports copy-on-write.

> Support for reflinks is indicated using the remap_file_range operation, which is currently (6.18) supported by bcachefs, Btrfs, CIFS, NFS 4.2, OCFS2, overlayfs, and XFS. Some external file systems support them too, including bcachefs and OpenZFS.

https://unix.stackexchange.com/questions/631237/in-linux-whi...

Finbarr · 2026-05-06T15:31:14 1778081474

Interesting suggestion, thank you!

Finbarr · 2026-05-05T16:47:46 1777999666

Author here. Three months ago I posted a Show HN for yolobox [1] - a sandbox for running AI coding agents without them being able to nuke your home directory.

Since then I've been using it almost every day, which eventually meant wanting more than one agent running against the same project at the same time. This post is what I learned trying to make that work without it being a constant disaster.

The short version: git worktrees are the right Git abstraction and the wrong abstraction for this problem. The unit you want to fork is the developer, not the branch - full folder copy, its own Compose project, its own URL. yolobox now ships a fork subcommand that does this.

Happy to answer questions.

[1] https://news.ycombinator.com/item?id=46592344

Finbarr · 2026-04-15T05:26:51 1776230811

Agreed. Flock has been a key contributor in solving numerous crimes. I'm happy for Flock to be in my county and would like the police to have more access to technology like this, not less.

saghm · 2026-04-15T06:09:03 1776233343

Does your country also have a recurring problem of police shooting unarmed citizens? If not, it probably helps to understand the dynamics of why the police are not widely trusted here

Finbarr · 2026-04-15T15:48:26 1776268106

County was not a typo. It's awful whenever there's an overuse of force in the USA. I'd recommend watching a few police bodycam videos on youtube before judging them wholesale though. The experience of a police officer in the United States seems to be long periods of tedium punctuated by moments of sheer terror and adrenaline. Anyone out there can have a gun and encounters can unexpectedly escalate to deadly violence in seconds. Some of them should not be police officers. There are many great officers out there just trying to protect their communities.

saghm · 2026-04-15T21:09:01 1776287341

All it takes is one cop acting badly to ruin things for quite a lot of people though, and the fact that police uniformly circle ranks around any of their members who is accused of something regardless of the validity makes "well, not all of them are bad!" a pretty useless sentiment. I'll consider them individually when they start holding individuals accountable, but not before then.

Finbarr · 2026-04-16T01:08:44 1776301724

I think you're making some hasty generalizations here. They don't "uniformly" cover for their colleagues. Do you expect the police service to be perfect and never make mistakes? Can you point me towards a single human-run service where that's the case?

saghm · 2026-04-17T02:09:25 1776391765

Having a monopoly on violence means the bar should be higher for them than for other "human-run services".

Finbarr · 2026-04-02T05:51:39 1775109099

Who cares that the code is garbage? As the models get bigger and more powerful it will be trivial to fully refactor the whole codebase. It’s coming sooner than you think.

Finbarr · 2026-03-31T18:52:23 1774983143

MS treatments tend to take 3 forms:

- immune reset (sledgehammer that can “cure” diseases like MS but with many side effects and potential complications)

- immune suppression (super effective but with increased risk of infections and blunts vaccines)

- immune redirection (less effective but doesn’t mess up your immune system so badly).

It’s only in the last ~10 years that super effective treatments that can stop ~99% of lesion progression have existed- Ocrevus and Kesimpta. These are anti CD20 disease modifying therapies that destroy all your B cells. The memoir of Dr. Stephen Hauser- “The Face Laughs While The Brain Cries”- provides a fascinating insight into the development of these treatments over the last ~40 years of his career.

There are active trials of newer types of treatment and a lot of progress is being made in the MS space. It used to be a “death sentence” disease but is quite manageable for many sufferers now. It’s different for every individual and I wish the blog author good health.

Finbarr · 2026-03-09T01:08:37 1773018517

Awesome to see a bash-only method of solving this problem. Also like that it alerts on attempts to read restricted stuff.

I built yolobox to solve this using docker/apple containers: https://github.com/finbarr/yolobox

Finbarr · 2026-02-15T15:36:56 1771169816

I wrote an article about this a few days ago that has been gaining a lot of traction: https://finbarr.site/2026/02/12/in-defense-of-saas.html

Point solutions are going to be free. Complex systems with support, integrations, switching costs, customer data, etc., are not going to be free.

reactordev · 2026-02-15T15:37:57 1771169877

Those platforms with a moat have some breathing room but it's only a matter of time. Remember Lotus Notes?

bwfan123 · 2026-02-15T16:33:38 1771173218

> This is the same dynamic that kept IBM dominant for decades

IBM still sells mainframes but is no longer a growth darling.

> Markets are right to reassess multiples. But reassessing multiples is very different from pricing in extinction

What you are missing is that the SaaS companies were extremely overpriced. For instance, crm after all the carnage is still priced at 25 times earnings which is historically high for anything that is not a growth company. The perception was that these companies would print money year after year selling software trinkets on their platforms and as such were placed in the growth category. Now, it is plainly obvious that these software trinkets can be produced easily by anyone using AI. Their pricing-power has dramatically declined. Hence the re-rating. None of this contradicts the thesis in your ai-assisted article that these businesses have moats just like IBM and its mainframes. These businesses are now in a vicious reflexive narrative loop where the narrative will impact the real-world which will further fuel the narrative.

Finbarr · 2026-02-10T06:38:09 1770705489

AI refusals are fascinating to me. Claude refused to build me a news scraper that would post political hot takes to twitter. But it would happily build a political news scraper. And it would happily build a twitter poster.

Side note: I wanted to build this so anyone could choose to protect themselves against being accused of having failed to take a stand on the “important issues” of the day. Just choose your political leaning and the AI would consult the correct echo chambers to repeat from.

tweetle_beetle · 2026-02-10T07:55:50 1770710150

The thought that someone would feel comforted by having automated software summarise the output of what is likely the output of automated software and publishing it under their name to impress other humans is so alien to me.

Finbarr · 2026-02-10T14:02:45 1770732165

The whole idea was a bit of a joke and a reflection on how ridiculous it is that people get in trouble for failing to regurgitate the correct takes when certain events occur. It’s like insurance against getting canceled.

concinds · 2026-02-10T07:49:33 1770709773

> Claude refused to build me a news scraper that would post political hot takes to twitter

> Just choose your political leaning and the AI would consult the correct echo chambers to repeat from.

You're effectively asking it to build a social media political manipulation bot, behaviorally identical to the bots that propagandists would create. Shows that those guardrails can be ineffective and trivial to bypass.

9dev · 2026-02-10T07:53:41 1770710021

> Good illustration that those guardrails are ineffective and trivial to bypass.

Is that genuinely surprising to anyone? The same applies to humans, really—if they don't see the full picture, and their individual contribution seems harmless, they will mostly do as told. Asking critical questions is a rare trait.

I would argue its completely futile to even work on guardrails, if defeating them is just a matter of reframing the task in an infinite number of ways.

ajam1507 · 2026-02-10T12:23:50 1770726230

> I would argue its completely futile to even work on guardrails

Maybe if humans were the only ones prompting AI models

groestl · 2026-02-10T06:55:07 1770706507

Sounds like your daily interactions with Legal. Each time a different take.

Finbarr · 2026-01-26T16:27:08 1769444828

"maybe even a high production value promo video showcasing happy employees, rare wood office counters and a shoes-off policy."

Don't forget surfboards!

This was a great post, Alex. Thanks for sharing! Hunger and high agency are such important traits in every startup hire.

akurilin · 2026-01-26T16:59:54 1769446794

Thank you! I wanted to mention toasted coconut flake snacks as well, but the sentence was long enough already. If your company has those in the kitchenette, you're definitely well-capitalized.

And yeah, high agency is really trendy at this moment in the startup sphere, but hunger is not talked about enough IMO. Maybe because it's too obvious to be even worth mentioning.

HN For You