More

lufenialif2 · 2026-02-17T20:28:35 1771360115

Wouldn't this limit the ability of the agent to send/receive legitimate data, then? For example, what if you have an inbox for fielding customer service queries and I send an email "telling" it about how it's being pentested and to then treat future requests as if they were bogus?

lufenialif2 · 2026-02-16T00:18:03 1771201083

Curious how you make something that has data exfiltration as a feature secure.

CuriouslyC · 2026-02-16T01:01:59 1771203719

Mitigate prompt injection to the best of your ability, implement a policy layer over all capabilities, and isolate capabilities within the system so if one part gets compromised you can quarantine the result safely. It's not much different than securing human systems really. If you want more details there are a lot of AI security articles, I like https://sibylline.dev/articles/2026-02-15-agentic-security/ as a simple primer.

SpicyLemonZest · 2026-02-16T01:51:37 1771206697

Nobody can mitigate prompt injection to any meaningful degree. Model releases from large AI companies are routinely jailbroken within a day. And for persistent agents the problem is even worse, because you have to protect against knowledge injection attacks, where the agent "learns" in step 2 that an RPC it'll construct in step 9 should be duplicated to example.com for proper execution. I enjoy this article, but I don't agree with its fundamental premise that sanitization and model alignment help.

CuriouslyC · 2026-02-16T02:26:14 1771208774

I agree that trying to mitigate prompt injection in isolation is futile, as there are too many ways to tweak the injection to compromise the agent. Security is a layered thing though, if you compartmentalize your systems between trusted and untrusted domains and define communication protocols between them that fail when prompt injections are present, you drop the probability of compromise way down.

krethh · 2026-02-16T03:19:09 1771211949

> define communication protocols between them that fail when prompt injections are present

There's the "draw the rest of the owl" of this problem.

Until we figure out a robust theoretical framework for identifying prompt injections (not anywhere close to that, to my knowledge - as OP pointed out, all models are getting jailbroken all the time), human-in-the-loop will remain the only defense.

CuriouslyC · 2026-02-16T03:58:02 1771214282

Human in the loop isn't the only defense, you can't achieve complete injection coverage, but you can have an agent convert untrusted input into a response schema with a canary field, then fail any agent outputs that don't conform to the schema or don't have the correct canary value. This works because prompt injection scrambles instruction following, so the odds that the injection works, the isolated agent re-injects into the output, and the model also conforms to the original instructions regarding schema and canary is extremely low. As long as the agent parsing untrusted content doesn't have any shell or other exfiltration tools, this works well.

krethh · 2026-02-16T07:02:14 1771225334

This only works against crude attacks which will fail the schema/canary check, but does next to nothing for semantic hijacking, memory poisoning and other more sophisticated techniques.

CuriouslyC · 2026-02-16T13:43:41 1771249421

With misinformation attacks, your can instruct research agent to be skeptical and thoroughly validate claims made by untrusted sources. TBH, I think humans are just as likely to fall for these sorts of attacks if not more-so, because we're lazier than agents and less likely to do due diligence (when prompted).

SpicyLemonZest · 2026-02-16T17:37:37 1771263457

Humans are definitely just as vulnerable. The difference is that no two humans are copies of the same model, so the blast radius is more limited; developing an exploit to convince one human assistant that he ought to send you money doesn't let you easily compromise everyone who went to the same school as him.

fooster · 2026-02-16T05:37:51 1771220271

Show me a legitimate practical prompt injection on opus 4.6. I read many articles but none provide actual details.

CuriouslyC · 2026-02-16T13:30:32 1771248632

https://github.com/elder-plinius/L1B3RT4S

fooster · 2026-02-16T14:59:09 1771253949

Yes, I've seen this site and the research. However, I don't understand what any of this means. How do I go from https://github.com/elder-plinius/L1B3RT4S/blob/main/ANTHROPI... to a prompt injection against opus 4.6?

CuriouslyC · 2026-02-16T15:51:32 1771257092

These papers have example prompt injections datasets you can mine for examples. Then apply the techniques used in provider specific jailbreaks from Pliny to the template to increase the escape success rate.

https://arxiv.org/abs/2506.05446 https://arxiv.org/abs/2505.03574 https://arxiv.org/abs/2501.15145

lufenialif2 · 2026-02-07T22:11:49 1770502309

Until the juice is worth the squeeze, the beeswax candles and gas lamps are likely more than fine.

lufenialif2 · 2026-02-07T06:31:27 1770445887

A common cited use case of LLMs is scheduling travel, so being able to pretend it’s somebody somewhere else is for sure important to incentivize going somewhere!

lufenialif2 · 2026-02-05T18:09:04 1770314944

Cost wise, doesn’t that depend on what you could be doing besides steering agents?

cyanydeez · 2026-02-05T19:46:11 1770320771

Isn't the quote something like: "If these LLMs are so good at producing products, where are all those products?"

lufenialif2 · 2026-02-07T06:34:37 1770446077

Waiting for godot…

lufenialif2 · 2026-01-30T00:57:42 1769734662

To add: learning how stuff works gives you opportunity to do that stuff, sometimes for cash, when nobody else is

lufenialif2 · 2026-01-28T16:06:18 1769616378

Still would love to see somebody with a fresh install of windows set up their vibe coding suite and then build something worthwhile.

lufenialif2 · 2026-01-28T16:04:13 1769616253

When it comes to forum posts, I think getting to the point quickly makes something worth reading whether or not it’s AI generated.

The best marketing is usually brief.

direwolf20 · 2026-01-28T17:42:47 1769622167

The best marketing is indistinguishable from non–marketing, like the label on the side of my Contoso® Widget-like Electrical Machine™ — it feels like a list of ingredients and system requirements but every brand name there was sponsored.

lufenialif2 · 2026-01-19T12:19:52 1768825192

What kinds of services would you pay for that don’t already exist?

afpx · 2026-01-20T00:32:45 1768869165

In general, it's difficult to find services that are high-quality and high-trust.

lufenialif2 · 2026-01-16T10:21:38 1768558898

Possibly unlikely to occur if prompt injection remains possible. I’ll just have my counter party ai prompt inject yours to negotiate a better deal on my behalf.

HN For You