For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | ayanb9440's commentsregister

Looks like somebody forgot to update the gitignore lol


Yup that's right its Robotic Process Automation.

Based on the feedback in this thread we're going to be releasing an updated version that focuses more around tooling for the browser agents themselves as opposed to scaling/scheduling, so stay tuned for that!


This should be fixed now


We're in the middle of putting this together right now but it's going to be a wrapper around Google Secret Manager for those that don't want to set up a secrets manager themselves.


Depends on the use case. Lots of hospitals and banks use RPA to automate routine processes on their EHRs and systems of record, because these kinds of software typically don't have APIs available. Or if they do, they're very limited.

Playwright and other browser automation scripts are a much more powerful version of RPA but they do require some knowledge of code. But there are more and more developers every year and code just gets more powerful every year. So I think it's a good bet to make that browser automation in code will replace RPA altogether some day.


We do support sentry. Finic projects are poetry scripts so you can `poetry add` any observability library you need.


That's a great suggestion! Essentially a cron job to check for website changes before your automation runs and possibly breaks.

What does this check look like for you? Do you just diff the html to see if there are any changes?


The issue with diffing html is selectors are autogenerated with any update to a website's code. Often website which combat scraping will autogenerate different HTML. First thing is to screen caption a website for comparison. Second, it is possible to determine all the visible elements on a page. With Playwright, inject event listeners to all elements on a page and start automated clicking. If the agent fills out forms, then make sure that all fields are available to populate. There are a lot of heuristics.


Are you doing screenshot comparison with Playwright? If so, how? Based on my research this looks to be a missing feature but I could be incorrect.


Playwright has screenshot comparison built in, including screenshotting a single element, blanking specific elements, and comparing the textual aspects of elements without a visual comparison. You can even apply a specific stylesheet for comparisons.

Everything I can see in this demo can be done with Playwright on its own or with some very basic infrastructure e.g. from Azure to run the tests (automations). I can't see what it is adding. Is it doing some bot-detection countermeasures?

Checking if the page behaviour has changed is pretty easy in Playwright because its primary purpose is testing, so just write some tests to assert the behaviour you expect before you use it.

We use Playwright to both automate and scrape the site of a public organisation we are obliged to use, as another public body. They do have some bot detection because we get an email when we run the scripts, asking us to confirm our account hasn't been compromised, but so far we have not been blocked. If they ever do block us we will need to hire someone to do manual data entry, but the automation has already paid for itself many times over in a couple of years.


Some ideas. First, are you saving the cookies and adding them when Playwright bootstraps? [0] Second, are you using the same IP address? Or better use a server running from your office or someone's house. Those are the big ones. The first prevents you from having to continuously login.

It is a game of cat and mouse. It is impossible to stop someone determined to circumvent bot protections.

[0] https://playwright.dev/docs/api/class-browsercontext#browser...


There are quite a few open source YC startups at this point. Our understanding is that:

1. Developer tooling should be open source by default 2. Open source doesn't meaningfully affect revenue/scaling because developers that would use your self-hosted version would build in-house anyway.


I know there are quite a few open source by default companies. But the ethos of open source is sharing / building something by the community and getting paid in a way which does not scale the way VC funding expectations work.

So to have some respect for the open source way on top of which you are building all this please stop advertising it as "open source infrastructure" in bold and sell it like a normal software product with "source available" on the footer.

If you do plan to go open source and actually follow its ethos, remove the funded by VC label and have self hosting front and center in the docs with the hosted bit somewhere in the footer.


Like again if you are not sure, what open source means, this is open source: https://appimage.org/

Hope it is abundantly clear with this example. Docker tried it's best to do the whole open source but business first and it led to disastrous results.

At best this will make your company suffer and second guess itself and at worst this is moral fraud.

Talk to your group partner about this and explain to them as well.


If you want to use an agent for scraping/automation, you would need to supply it with auth credentials. So permission is required by default.


Looking at their docs, it seems that with Browserbase you would still have to deploy your Playwright script to a long-running job and manage the infra around that yourself.

Our approach is a bit different. With finic you just write the script. We handle the entire job deployment and scaling on our end.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You