More

simon_luv_pho · 2026-03-06T05:50:25 1772776225

WebMCP doesn’t seem to be available for use inside webpages or extensions.

simon_luv_pho · 2026-03-05T19:59:53 1772740793

I'm 2 years too late for that one...

simon_luv_pho · 2026-03-05T19:54:34 1772740474

Everything happens at runtime, on the HTML level.

It uses a similiar process as `browser-use` but all in the web page. A script parses the live HTML, strips it down to its semantic essentials (HTML dehydration), and indexes every interactive element. That snapshot goes to the LLM, which returns actions referencing elements by index. The agent then simulates mouse/keyboard events on those elements via JS.

This works best on pages with proper semantic HTML and accessibility markup. You can test it right now on any page using the bookmarklet on the homepage (unless that page CSP blocks script injection of course).

simon_luv_pho · 2026-03-05T19:42:21 1772739741

Not yet. Currently focused on the more common interaction patterns. PRs welcome though!

popalchemist · 2026-03-05T19:46:43 1772740003

Gotcha. Still very cool! Congrats on the release.

simon_luv_pho · 2026-03-05T20:28:43 1772742523

Thanks!

simon_luv_pho · 2026-03-05T19:38:05 1772739485

I added in the system prompt that it should skip CAPTCHAs and hand control back to the user. Currently working on a proper human-in-the-loop feature. That's actually one of the key advantages of running the agent inside your own browser.

Mnexium · 2026-03-05T21:28:30 1772746110

Makes sense.

For curiosity's sake, have you had it try to attempt captchas?

If so, what were the results?

simon_luv_pho · 2026-03-05T21:43:35 1772747015

I haven’t. I don’t think it will work well.

I use a text-based approach. Captchas like “crossroad” usually need a screenshot, a visual model and coordinate-based mouse events.

simon_luv_pho · 2026-03-05T19:30:05 1772739005

Thanks!

It supports any OpenAI-compatible API out of the box, so AWS Bedrock, LiteLLM, Ollama, etc. should all work. The free testing LLM is just there for a quick demo. Please bring your own LLM for long-time usage.

simon_luv_pho · 2026-03-05T19:25:57 1772738757

I'm looking into a European testing endpoint. The legal and compliance requirements are quite hassle, and persuading my company to pay for that infrastructure is gonna be a tough sell.

simon_luv_pho · 2026-03-05T19:23:39 1772738619

Full transparency: I work at Alibaba and published this under Alibaba's open-source org. I sometines maintain it during work hours, so yes, Alibaba technically pays me for it. That said, this is my project — it's MIT-licensed, includes no backend service, and is open for anyone to audit.

The free testing LLM endpoint is hosted on Alibaba Cloud because I happen to have some company quota to spend, but it's not part of the library. Bring your own LLM and there is zero data transmission to Alibaba or anywhere else you haven't configured yourself.

I highly recommend using it with a local Ollama setup.

Zetaphor · 2026-03-06T04:36:24 1772771784

Thank you for sharing this!

simon_luv_pho · 2026-03-05T19:08:01 1772737681

Please use your own LLM api instead!

The free testing LLM is Qwen hosted by Aliyun. Qwen and DeepSeek are the only ones I can afford to offer for free. It's just there to lower the try-out barrier; please DO NOT rely on it.

The library itself does NOT include any backend service. Your data only goes to the LLM api you configured.

I tested it on local Ollama models it works fine.

darkvertex · 2026-03-06T04:53:58 1772772838

Or why not stay fully local with WebLLM... https://webllm.mlc.ai

simon_luv_pho · 2026-03-06T05:44:42 1772775882

That looks great! I also thought about calling the Gemini nano model embedded into Chrome (only extensions can do that). But after some testing on smaller models I found that anything smaller than 9b can’t really handle the complex tool call schema I use.

Qwen3.5 4b is quite good but still gives messy json quite often. But it’s very promising!

Maybe after one more model iteration or some fine-toning we can go fully embedded?

simon_luv_pho · 2026-03-05T18:57:18 1772737038

Darn. Pageant would've been a nice name though. Maybe `page-agent.js` is more relevant in web dev community.

graypegg · 2026-03-05T21:01:08 1772744468

I think every successful Show HN post ends up with a "thought this was about X" or "didn't look up the name first?" comment. Consider it a win! I don't think anyone will mistake a tool for putty with your tool, but you might share a google search page with it.

mmarian · 2026-03-05T19:54:19 1772740459

I think page agent is good. I've never heard of putty's pageant. And I think it's better to distinguish it from general meaning of pageant (for beauty).

simon_luv_pho · 2026-03-05T20:02:12 1772740932

Thanks!

HN For You