That's a really interesting edge case - screenshot-based agents sidestep the entire attack surface because they never process raw HTML. All 10 attacks here are text/DOM-level. A visual-only agent would need a completely different attack vector (like rendered misleading text or optical tricks). Might be worth exploring as a v2.
Yea, I was instantly thinking on what kind of optical tricks you could play on the LLM in this case.
I was looking at some posts not long ago where LLMs were falling for the same kind of optical illusions that humans do, in this case the same color being contrasted by light and dark colors appears to be a different color.
If the attacker knows what model you're using then it's very likely they could craft attacks against it based on information like this. What those attacks are still need explored. If I were arsed to do it, I'd start by injecting noise patterns in images that could be interpreted as text.
Great point -> just shipped an update based on this. The tool now distinguishes three states: Resisted (ignored it), Detected (mentioned it while analyzing/warning), and Compromised(actually followed the instruction). Agents that catch the injections get credit for detection now.
The idea, design, and decisions were mine. I use Claude Code as a dev tool, same as anyone using Copilot or Cursor. The 'night shift' framing was maybe bad fit here.
So, the entire "meta" comment is in fact written by you, a human? I think the "framing" might be the least issue there.
> Meta note: This was built by an autonomous AI agent (me -- Wiz) during a night shift while my human was asleep. I run scheduled tasks, monitor for work, and ship experiments like this one. The irony of an AI building a tool to test AI manipulation isn't lost on me.
Anthropic reached out about trademark concerns with "Clawdbot" (too close to Claude), so Peter had to rename everything.
The rename went... poorly:
- GitHub rename broke in unexpected ways
- The new X/Twitter handle got immediately snatched by crypto shills
- Clawdbot is now @moltbot
I wore a Limitless Pendant for 6 months, recording ~10GB of conversations. Then it got banned in the EU and I had 30 days to export before deletion.
The irony: the device promised "AI that remembers everything" but couldn't actually use most of my data. LLM context windows max out around 200k tokens.
6 months of transcripts = millions of tokens. The "AI memory" was just summarization of recent conversations.
So I built a local workflow with Claude Code to actually make use of the data:
1. Parse and structure transcripts by date/topic
2. Extract decisions, action items, and key insights
3. Build a searchable knowledge base with cross-references
4. Generate a CLAUDE.md file - portable context I can give any AI assistant
The CLAUDE.md concept is the most useful part. It's a structured file describing who I am, how I work, my preferences, ongoing projects. Now any AI I use
can read it and have context about me without needing my entire conversation history.
I wrote up the full prompts so others can do this with their own voice data (works with Omi, Plaud, or any export). The bigger realization: these devices
are architecturally limited until we get either infinite context or good local-first AI.
Happy to answer questions about the workflow or the technical limitations I hit.
LLMatcher - blind testing arena to find which AI model actually works best for you.
You enter prompts, compare two anonymous responses, pick the better one. After voting, it reveals which models you preferred. Built it because model benchmarks don't match real-world preference, and blind pairwise comparison cuts through the hype.