For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | buremba's commentsregister

It playbook is that a model is too dangerous until a competitor releases a competing model that beats yours.

Anthropic announced its latest Mythos model a few hours ago, saying it’s too powerful to release.

At which point you tell them they are being extremely reckless but subtly mention that something new & even scarier is being developed internally that's going to blow everything else out of the water.

I get why they block OpenClaw and it makes sense but I wonder if they can actually detect OpenClaw calling Claude Code CLI using something like acpx.

It's simply identical to how people use Claude Code locally.


does it matter?

Yeah it does. If you're happy routing your personal data through software that lacks an author who fully understands what the software does, good for you. Suggesting that this doesn't matter in general is.. not an opinion I'd share publicly.

You can ask your agent to verify or review code. Just because people wrote code by hand, it doesn't mean you should trust

Immensely, and in saying it as someone both writing stuff by hand and using AI as a helper.

At the very least any primarily AI submission should be tagged. If untagged - should be removed.


Compliance tech company who doesn't know about open-source. Interesting.

They're composable but computers are not. Two skills might depend on a different version of a dependency which is pretty hard to maintain and their needs to be a deterministic system (agents are not) to resolve the conflicts and make sure two skills can live in the same environment.

If you are using Python it should be creating separate venv's for different skills. It is 2016, uv can install any version of Python you need.

I extensively use uv (IMO better than venv) but still it's Python specific and not universal. npm is much worse and native binaries are almost impossible to deal with multiple versions.

nix is specifically targeting this use-case and it'e extensively used by vendors like Replit.


I vote for taste

There is nothing wrong with MCP, it's just that stdio MCP was overengineered.

MCP's Streamable HTTP with OAuth discovery is the best way to ship AI integration with your product nowadays. CLIs require sandboxing, doesn't handle auth in a standard way and it doesn't integrate to ChatGPT or Claude.

Look at Sentry, they just ship a single URL https://mcp.sentry.dev/mcp and you don't need anything else. All agents that supports MCP lets you click a link to login to Sentry and they make calls to Sentry to fetch authentificated data.

The main problem with MCP is the implementation. Instead of using bash to call MCP, agents are designed to make single MCP tool calling which doesn't allow composability. We solve this problem with exposing MCP tools as HTTP endpoints and it works like charm.


Could you expand on this some more? I'm not quite following.

I agree with the sandboxing challenge of a CLI, although I think any CLI (or MCP) wrapping an http API should be subject to a sane permissioning system that's a first class concept in the API itself. That's in my opinion the correct way to limit what different users/tools/agents can do.

But I don't fully understand the Streamable HTTP point.


I doesn't matter how it "should" work. In the real world you need to interact with external systems which don't have granular enough permission schemes.

People out here letting Claude code run CLIs using their own user permissions are morons waiting to have their data deleted.


I get that. Should and DO are different. But you aren't addressing my Streamable HTTP question which is the heart of what I asked.


CLI enables the actions to be made on behalf of you, the external service is not aware whether it's you or AI making the calls. With MCP, Sentry knows it's AI making the call so can be smarter about the security. There is many MCP annotation hints on tools to mark the as destructive, read-only etc.


That's interesting, but that still sounds like something a proper auth/token permission system would more than address. You're also actively choosing to limit what functionality MCP provides, which is fine, but there are many ways to do the same via the API or CLI tooling.

I'm not saying you are wrong to do this, I just don't think it's enough to convince me that yes this is the one true approach you should use.


There's nothing special about using http other than most corporate firewalls allow it. It's just the pragmatic choice.


This is my take as well.

Way easier to set up, centralized auth and telemetry.

Just use it for the right use cases.


AFAIK Claude Code doesn't inject all the MCP output into the context. It limits 25k tokens and uses bash pipe operators to read the full output. That's at least what I see in the latest version.


That's true, Claude Code does truncate large outputs now. But 25k tokens is still a lot, especially when you're running multiple tools back to back. Three or four Playwright snapshots or a batch of GitHub issues and you've burned 100k tokens on raw data you only needed a few lines from. Context-mode typically brings that down to 1-2k per call while keeping the full output searchable if you need it later.


My take is that agents should only take actions that you can recover from by default. You can gradually give it more permission and build guardrails such as extra LLM auditing, time boxed whitelisted domains etc. That's what I'm experimenting with https://github.com/lobu-ai/lobu

1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.

3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.

Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production.


That's a decent practice from the lens of reducing blast radius. It becomes harder when you start thinking about unattended systems that don't have you in the loop.

One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks.

Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.

AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests).


Current LLMs are nowhere near qualified to be autonomous without a human in the loop. They just aren't rigorous enough. Especially the "scientist throwing a swarm to deal with common ML exploratory tasks." The judgement of most steps in the exploratory task require human feedback based on the domain of study.

> Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.

Completely agreed. This is because LLMs are atrocious at judgement and guiding the sequence of exploration is critically dependent on judgement.


I'd like to try a pattern where agents only have access to read-only tools. They can read you emails, read your notes, read your texts, maybe even browse the internet with only GET requests...

But any action with side-effects ends up in a Tasks list, completely isolated. The agent can't send an email, they don't have such a tool. But they can prepare a reply and put it in the tasks list. Then I proof-read and approve/send myself.

If there anything like that available for *Claws?


There is no real such thing as a read only GET request if we are talking about security issues here. Payloads with secrets can still be exfiltrated, and a server you don’t control can do what it wants when it gets the request.


GET and POST are merely suggestions to the server. A GET request still has query parameters; even if the server is playing by the book, an agent can still end up requesting GET http://angelic-service.example.com/api/v1/innocuous-thing?pa... and now your `dangerous-secret` is in the server logs.

You can try proxying and whitelisting its requests but the properly paranoid option is sneaker-netting necessary information (say, the documentation for libraries; a local package index) to a separate machine.


The proxy approach for secret injection is the right mental model, but it only works if the proxy itself is hardened against prompt injection. An agent that can't access secrets directly can still be manipulated into crafting requests that leak data through side channels — URL params, timing, error messages.

The deeper issue: most of these guardrails assume the threat is accidental (agent goes off the rails) rather than adversarial (something in the agent's context is actively trying to manipulate it). Time-boxed domain whitelists help with the latter but the audit loop at session end is still reactive.

The /revert snapshot idea is underrated though. Reversibility should be the first constraint, not an afterthought.


> but it only works if the proxy itself is hardened against prompt injection.

Yes, I'm experimenting using a small model like Haiku to double check if the request looks good. It adds quite a bit of latency but it might be the right approach.

Honestly; it's still pretty much like early days of self driving cars. You can see the car can go without you supervising it but still you need to keep an eye on where it's going.


> 1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.

> 3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)


> Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.

> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

You should never give any secrets to your agents, like your Gmail access tokens. Whenever agents needs to take an action, it should perform the request and your proxy should check if the action is allowed and set the secrets on the fly.

That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.


> That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.

I wonder how long until we see a startup offering such a proxy as a service.


Literally every email client on the planet has supported `mailto:` URIs since basically the existence of the world wide web.

Just generate a mailto Uri with the body set to the draft.


> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

It's easy, and you did it the right way. Read "don't let your agents see any secret" as "don't put secrets in a filesystem the agents have access to".


I think mailto: links they output (a la

https://mailtolink.me/

) are a great way to get these drafts out even.


I think they should be aware that CC is big enough codebase that they can't vibe code anymore.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You