For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | cosinusalpha's commentsregister

Do you experience any context pollution with that approach?


Writing your own skill is actually a lot better for context efficiency.

Your skill will be tuned to your use case over time, so if there's something that you do a lot you can hide most of the back-and-forth behind the python script / cli tool.

You can even improve the skill by saying "I want to be more token efficient, please review our chat logs for usage of our skill and factor out common operations into new functions".

If anything context waste/rot comes from documentation of features that other people need but you don't. The skill should be a sharp knife, not a multi-tool.


Not really. less bad than the mcps i used.


Thanks!

I actually tried a raw HTML when I was exploring solutions. It worked for "one-off" tasks, but I ran into major issues with replayability on modern SPAs.

In React apps, the raw DOM structure and auto-generated IDs shift so frequently that a script generated from "Raw HTML" often breaks 10 minutes later. I found ARIA/semantics to be the only stable contract that persists across re-renders.

You mentioned the raw HTML approach is "expensive". Did you feed the full HTML into the context, or did you create a BS4 "tool" for the LLM to query the raw HTML dynamically?


I actually think the CLI approach helps with those boundaries. Because webctl commands are discrete and pipeable (e.g. webctl snapshot | llm | webctl click), the "authority" is reset at every step of the pipeline. It feels easier to audit a text stream of commands than a socket connection that might be accumulating invisible context.


Thanks! To clarify: webctl allows you to manually interact with the browser window at any time. It even returns "manual interaction" breakpoints to stdout if it detects an SSO/login wall.

But I agree, attaching to the OS "daily driver" instance specifically would be a nice addition.


I am also not sure if MCP will eventually be fixed to allow more control over context, or if the CLI approach really is the future for Agentic AI.

Nevertheless, I prefer the CLI for other reasons: it is built for humans and is much easier to debug.



100% - sharing CLIs with the agent has felt like another channel to interact with them once I’ve done it enough, like a task manager the agent and I can both use using the same interface


A background daemon holds the session state between different CLI calls. This daemon is started automatically on the first webctl call and auto-closes after a timeout period of inactivity to save resources.


I see, nice. Is there a way to run multiple sessions?


Yes, you can create isolated environments using the "--session NAME" flag.

It isolates cookies and local storage for that specific run. Since it's a V1 release, there might be some edge cases in the session isolation - if you hit any, please open an issue!


I don't have an objective benchmark yet. I tried several existing solutions, especially the MCP servers for browser automation, and none of them were able to reproducibly solve my specific task.

An objective benchmark is a great idea, especially to compare webctl against other similar CLI-based tools. I'll definitely look into how to set that up.


To be honest, I hadn't seen that one yet!

The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles/semantics (e.g. role=button name="Save") rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.

Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I'd love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You