More

ayanb9440 · on Sept 19, 2024

Looks like somebody forgot to update the gitignore lol

ayanb9440 · on Sept 19, 2024

Yup that's right its Robotic Process Automation.

Based on the feedback in this thread we're going to be releasing an updated version that focuses more around tooling for the browser agents themselves as opposed to scaling/scheduling, so stay tuned for that!

ayanb9440 · on Sept 17, 2024

This should be fixed now

ayanb9440 · on Sept 17, 2024

We're in the middle of putting this together right now but it's going to be a wrapper around Google Secret Manager for those that don't want to set up a secrets manager themselves.

ayanb9440 · on Sept 17, 2024

Depends on the use case. Lots of hospitals and banks use RPA to automate routine processes on their EHRs and systems of record, because these kinds of software typically don't have APIs available. Or if they do, they're very limited.

Playwright and other browser automation scripts are a much more powerful version of RPA but they do require some knowledge of code. But there are more and more developers every year and code just gets more powerful every year. So I think it's a good bet to make that browser automation in code will replace RPA altogether some day.

ayanb9440 · on Sept 17, 2024

We do support sentry. Finic projects are poetry scripts so you can `poetry add` any observability library you need.

ayanb9440 · on Sept 17, 2024

That's a great suggestion! Essentially a cron job to check for website changes before your automation runs and possibly breaks.

What does this check look like for you? Do you just diff the html to see if there are any changes?

dataviz1000 · on Sept 17, 2024

The issue with diffing html is selectors are autogenerated with any update to a website's code. Often website which combat scraping will autogenerate different HTML. First thing is to screen caption a website for comparison. Second, it is possible to determine all the visible elements on a page. With Playwright, inject event listeners to all elements on a page and start automated clicking. If the agent fills out forms, then make sure that all fields are available to populate. There are a lot of heuristics.

thestepafter · on Sept 17, 2024

Are you doing screenshot comparison with Playwright? If so, how? Based on my research this looks to be a missing feature but I could be incorrect.

sahmeepee · on Sept 17, 2024

Playwright has screenshot comparison built in, including screenshotting a single element, blanking specific elements, and comparing the textual aspects of elements without a visual comparison. You can even apply a specific stylesheet for comparisons.

Everything I can see in this demo can be done with Playwright on its own or with some very basic infrastructure e.g. from Azure to run the tests (automations). I can't see what it is adding. Is it doing some bot-detection countermeasures?

Checking if the page behaviour has changed is pretty easy in Playwright because its primary purpose is testing, so just write some tests to assert the behaviour you expect before you use it.

We use Playwright to both automate and scrape the site of a public organisation we are obliged to use, as another public body. They do have some bot detection because we get an email when we run the scripts, asking us to confirm our account hasn't been compromised, but so far we have not been blocked. If they ever do block us we will need to hire someone to do manual data entry, but the automation has already paid for itself many times over in a couple of years.

dataviz1000 · on Sept 18, 2024

Some ideas. First, are you saving the cookies and adding them when Playwright bootstraps? [0] Second, are you using the same IP address? Or better use a server running from your office or someone's house. Those are the big ones. The first prevents you from having to continuously login.

It is a game of cat and mouse. It is impossible to stop someone determined to circumvent bot protections.

[0] https://playwright.dev/docs/api/class-browsercontext#browser...

ayanb9440 · on Sept 17, 2024

There are quite a few open source YC startups at this point. Our understanding is that:

1. Developer tooling should be open source by default 2. Open source doesn't meaningfully affect revenue/scaling because developers that would use your self-hosted version would build in-house anyway.

ilrwbwrkhv · on Sept 17, 2024

I know there are quite a few open source by default companies. But the ethos of open source is sharing / building something by the community and getting paid in a way which does not scale the way VC funding expectations work.

So to have some respect for the open source way on top of which you are building all this please stop advertising it as "open source infrastructure" in bold and sell it like a normal software product with "source available" on the footer.

If you do plan to go open source and actually follow its ethos, remove the funded by VC label and have self hosting front and center in the docs with the hosted bit somewhere in the footer.

ilrwbwrkhv · on Sept 17, 2024

Like again if you are not sure, what open source means, this is open source: https://appimage.org/

Hope it is abundantly clear with this example. Docker tried it's best to do the whole open source but business first and it led to disastrous results.

At best this will make your company suffer and second guess itself and at worst this is moral fraud.

Talk to your group partner about this and explain to them as well.

ayanb9440 · on Sept 17, 2024

If you want to use an agent for scraping/automation, you would need to supply it with auth credentials. So permission is required by default.

ayanb9440 · on Sept 17, 2024

Looking at their docs, it seems that with Browserbase you would still have to deploy your Playwright script to a long-running job and manage the infra around that yourself.

Our approach is a bit different. With finic you just write the script. We handle the entire job deployment and scaling on our end.

HN For You