For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | mcstempel's commentsregister

There are options beyond auth walls for detecting/enforcing behavior as well since these scrapers have very recognizable device signatures: https://stytch.com/blog/detecting-ai-agent-use-abuse/


thanks for sharing!


wow, this is wonderfully made


Ah, this is great feedback -- I don't think we do enough to articulate how much we're doing beyond that simplified explanation of device fingerprinting on those docs. I'll get that page updated, but 2 main things worth mentioning:

1. We have a few proprietary fingerprint methods that we don't publicly list (but do share with our customers under NDA), which feed into our ML-based browser detection that assesses those fingerprint data points against the entire historical archives of every browser version that has been released, which allows us to discern subtle deception indicators. Even sophisticated attackers find it difficult to figure out what we're fingerprinting on here, which is one reason we don't publicly document it.

2. For a manual attacker running attacks within a legitimate browser, our Intelligent Rate Limiting (IntRL) tracks and rate-limits at the device level, making it effective against attackers using a real browser on their own machine. Unlike traditional rate limiting that relies on brute traits like IP, IntRL uses the combo of browser, hardware, and network fingerprints to detect repeat offenders—even if they clear cookies or switch networks. This ensures that even human-operated, low-frequency attacks get flagged over time, without blocking legitimate users on shared networks.


Thanks for the clarification, the second point is really smart and something that didn't occur to me! You can slow down a scraper and add real mouse movements, but at the end of the day, if you don't have it collecting data for more extended periods than a human would be able to do, what's the point?

And of course the swiss cheese model applies here, as always. Thanks for fighting the good fight! I'm a big hater of IP laws, but this cultural move towards "scraping is never immoral" seems like a big step too far in the other direction.


CAPTCHAs have been ineffective as a true "bot detection" technique for a while as tools like anti-captcha.com allow for outsourcing it to real humans. BUT they have been successful at the economic side of raising the cost of programmatic traffic on your site (which is good enough for some use cases)

As the author of this agent detection post, we agree that CAPTCHA and vanilla browser/device fingerprinting is quickly not going to be very valuable in isolation, but we still see a lot of value in advanced network/device/browser fingerprinting

The main reason is that the underlying corpus & specificity of browser/device/network data points you get from fingerprinting makes it much easier to build more robust systems on top of it than a binary CAPTCHA challenge. For us, we've found it very useful to still have all of the foundational fingerprinting data as a primitive because it let us build a comprehensive historical database of genuine browser signatures to train our ML models to detect subtle emulations, which can reliably distinguish between authentic browsers and agent-driven imitations

That works really well for the OpenAI/BrowserBase models. Where that gets tricky is the computer-use agents where it's actually putting its hands on your keyboard and driving your real browser. Still though, it's valuable to have the underlying fingerprinting data points because you can still create intelligent rate limits on particular device characteristics and increase the cost of an attack by forcing the actor to buy additional hardware to run it


I don't think tracking everything is the way to go; info would get outdated very soon and tracking compromises user privacy. A simple solution could be to throw a challenge that humans can easily solve, but agents absolutely cannot now or in the future (think non-audio/visual/text).


LinkedIn always hits me with those frustrating custom CAPTCHAs where you have to rotate the shape 65 degrees -- they've taken a pretty blunt, high-friction approach to bot detection

I think most apps should primarily start with just monitoring for agentic traffic so they can start to better understand the emergent behaviors they're performing (it might tell folks where they actually need real APIs for example), and then go from there


Ironic that orgs using everyone's content (fairly or not) stuffing AI down our throats are the ones aggressively against their users using AI on their services.


Hey there, I'm the author of the post. I'm actually pretty sympathetic to your viewpoint, and I wanted to clarify my stance.

I actually spent years working at a "good bot" company (Plaid), which focused on making users' financial data portable. The main reason Plaid existed was that banks made it hard for users to permission their data to other apps -- typically not solely out of security concerns, but to also actively limit competition. So, I know how the "bot detection" argument can be weaponized in unideal ways.

That said, I think it’s reasonable for app developers to decide how their services are consumed (there are real cost drivers many have to think about) -- which includes the ability to have monitoring & guardrails in place for riskier traffic. If an app couldn't detect good bots, that app also can't do things like 1) support necessary revocation mechanisms for end users if they want to clawback agent permissions or 2) require human-in-the-loop authorization for sensitive actions. Main thing I care about is that AI agent use remains safe and aligned with user intent. For your example of an anonymous read-only site (e.g. blog), I'm less worried about that than an AI agent with read-write access on behalf of a real human's account.

My idealistic long-term view though is that supporting AI agent use cases will eventually become table stakes. Users will gravitate toward services that let them automate tedious tasks and integrate AI assistants into their workflows. Companies that resist this trend may find themselves at a competitive disadvantage. Ultimately, this has started to happen with banking & OAuth, though pretty slowly.


It seems like cases (1) and (2) would both be better handled by letting the user give their user agent a separate security context if they choose, instead of trying to detect/guess what kind of browser made that http request. I'm thinking about things like oauth permissions, GitHub's sudo mode, etc. Otherwise your magic detection code will inevitably end up telling an ELinks user "sorry, you need to download chrome to view your payment info".


You read our mind! https://stytch.com/blog/the-age-of-agent-experience/

Very much agreed that's the long-term goal, but I think we'll live in a world where most apps don't support oauth for a while longer (though I'd love for all of them to -- we're actually announcing something next week that makes this easy for any app to do)

But we're also envisioning an interim period where users are delegating to unsanctioned external agents (e.g. OpenAI Operator, Anthropic Computer Use API, etc.) prior to apps catching up and offering proper oauth


What came up in this interview [0] was that

1) Because of "AI" we're moving more to API-like model in which the end user gets more say how they want to consume content.

2) That is in tension with (ahem) intention. We can't direct the user "experience" and have a "positive model" (not based on denylists). We can present data bit we can't enforce our intentions (informally defined ideas about how it may be used).

3) That means we must move to a behavioural security/access model in place of identity based ones (including categorical identity like ASN, user-agent, device type... )

[0] https://cybershow.uk/episodes.php?id=39


Why can't users revoke permissions if the service can't detect good bots? Those seem wholly unrelated.


> That said, I think it’s reasonable for app developers to decide how their services are consumed

But it's a far step from that to (attempting to) control the user agent, or only allow blessed clients/devices.

Of course the site operator is concerned with limiting and preventing abuse by malicious users and agents, and an app developer should build for enabling that.

> Main thing I care about is that AI agent use remains safe and aligned with user intent

Nice and all. Keep a level perspective though: At scale, you can't keep control of your users not getting scammed/phished/hacked, or plain doing destructive uninformed actions on their own accord. Similar here: If you aim for 0, that will be to detriment to (at best, I believe) your growth.

I believe the kind of patterns you describe in the article are in fact anti-patterns. Look at the kind of web and internet they lead to. Look at what they do to individual agency in society. Across the board, abuse is increasing alongside negative side-effects from false positives of these kinds of counter-measures - which will invariably end up abused (by ignorance or intentionally) to exclude an increasing number of "undesireds". Systematic discrimimation is an apt term for the emergent consistent blocking of certain groups and individuals even if "it's just the stats playing out that way"?

Consider accessibility, and the diversity of humans. It is a folly to believe you can craft a singular user-experience that works satisfactory for everyone, or even catalogue and "officially support" what's in need by your entire target audience. By blocking access to screen readers and other accessibility agents you limit or prevent the use from those relying on these tools.

> My idealistic long-term view though is that supporting AI agent use cases will eventually become table stakes.

My optimistic long-term view is that accessing content on my own terms with an agent I compiled myself is still an option (without any need for dystopian centralized signing services a la apple/mozilla), and that companies are still legally allowed to offer that option.


Plaid is not a "good bot" company. Despite posturing from leadership, it is fundamentally unethical to build a pervasive banking middle-man service which requires users to surrender their private account credentials in order to operate. What if every business operated this way? It's disgusting that companies like Plaid have considerably set back public discourse on acceptable privacy tradeoffs.


I'd assume they had to work with what was offered. As long as banks required usernames and passwords with no oauth possible, what's plaid to do? Their users wanted their service, but the banks used username password credentials.

In any case, "good bot" doesn't refer to best practices such as rejecting suppliers with antiquated auth and guiding users to others, it refers to not being intentionally malicious and acting as users' agents instead.


You write as if someone held a gun to your head and force you to sign up for Plaid. Plaid doesn't require anyone to use it.

Your bank is the entity you're ultimately upset with, don't malign a company that generated a _very good solution_ to a _huge problem_ and THEN worked with their industry peers to cajole these huge banks to let you have access to your data how you want to use it. Before Yodlee and Plaid came around there was a snowballs chance in hell I could ever hope to get at my banking transactions in an API and now I can, and in many cases I never have to give supply my banking credentials to anyone but my bank.


> You write as if someone held a gun to your head and force you to sign up for Plaid. Plaid doesn't require anyone to use it.

There is not a physical gun pointed at my head, but an increasing amount of digital online interactions are solely gated by Plaid. I've run into plenty cases where I simply had no choice, for example dealing with landlords.

And you already know how long it takes for financial systems to evolve once in place, as evidenced by your own frustration for them not embracing APIs and digital sovereignty. So once a solution like Plaid is in place, we're normalizing this kind of man-in-the-middle security nightmare for generations to come. Even if Plaid's founders did not have malicious intent, the company will eventually change hands to someone less ethical, and the door is open for other companies to seek the same kind of relationships with end users. If not malicious, Plaid is brazenly reckless and short-sighted.

And regardless... I as a consumer do not want to hand over my passwords to a man in the middle, I'm already angry enough at the security and password restrictions I encounter now with financial institutions. If I am in a position where I cannot rent a home or make an important purchase without interacting with a company like Plaid, where is my digital sovereignty?


I think this anger with Plaid is unwarranted. Without them, or before them, you had zero API access because the banks (including yours) don't give a rat's ass on your fancy access needs. Now Plaid managed to gather together some kind of access. Are they to blame because they managed that? Do you still have any alternative with the bank? I think no, and no. You can get back to the "standard" situation of no API, no guns involved, or you can use them as middlemen. Or you can create your own middleman service if you like and everybody will appreciate your Plaid alternative (except Plaid, I suppose).


I think it's warranted if you don't look closely, unwarranted if you look deeper, and once again warranted if you look even deeper...

Before Plaid there, the floor to require a bank account to do something in your SaaS was high to impossible.

Now the floor is low, and we got a bunch of applications that take advantage of that, so good right?

The problem is most of those applications are not in your best interests as a person.

-

It mostly just enabling a bunch of junk BNPL debt and modern day payday loan schemes.

It allows offerings that are too risky to be good ideas to patch things up by just peeking into an account and making sure they'll be able to take their $X before some other rent-seeker drains the account for the month.

It also normalizes so much more access and visibility than is actually needed, so even in cases where the risk was acceptable before, now why not just peek really quickly and improve your bottom line at the expense of having yet another service with access to your financial data.

Overall Plaid probably has not been a net positive for the average person. Other countries have open banking platforms but they're also must stronger on regulation and oversight than the US, so you don't seem it become quite as much of a negative.


> Are they to blame because they managed that?

Do you understand that the ends don't always justify the means? Do you understand that not trading security and privacy for convenience means putting up with inconvenience? My complaints are warranted because yes, they are to blame, no, I do not want to be forced to use their service.

And when the company is eventually sold and financial transaction data harvested (whether against the wishes of the founders or not, loopholes exist), apologists will turn around and blame the new company instead of Plaid, who opened the door for them.

> Or you can create your own middleman service if you like and everybody will appreciate your Plaid alternative.

I think the financial tech market is rapidly evolving, and I'll just wait. If I need financial automation and a service like Stripe is not available, I can always use a cryptocurrency which respects my autonomy and privacy.


Well then banks should offer a proper API with tokens and permissions.

What's that? They don't? Guess I'll just have to give Plaid my password then. Stupid banks.

btw this is the exact same way Facebook got people to migrate off MySpace.


So do you also expect the bank to back you up if you get hacked, given your exposure of the password to plaid or other services?

Not sure banks are the best example for this discussion, though, since banks have legitimate reasons to secure and promote security of their accounts that is beyond simple IT resource usage.


I remember Facebook's shady user acquisition tactics, and I also do not use Facebook and similarly think their business model is morally bankrupt.

> Guess I'll just have to give Plaid my password then.

Learned helplessness, trading digital sovereignty for convenience. There is a larger war being fought here that is bigger than you or me. Had Plaid not been forced upon me, I would never have used it willingly.


You think digital sovereignty is when you are not allowed to do what you like with your account, but must follow someone else's terms and conditions?


It's a complex topic which requires the balancing of some things that may seem at odds.

Yes, digital sovereignty means owning your data and the means to transfer and activate it.

It also covers things like not having to relinquish a personal key or passphrase in order to do so, as that severely diminishes your personal security, erodes privacy and trust, and enables a future society where corporate participation is mandatory and the dissolution of security of privacy boundaries considered essential and unavoidable.

Such a system is horribly anti-consumer, even if it seems nice while the lollipop is still in your mouth.


How would you transfer your data without authenticating to it? They could provide you an executable to run on your computer with your password?


Encryption and public keys. That problem has been solved for a long time, it just needs to be adapted for data granularity so that each service can be exposed to specific bits of data and actions that modify them within constraints.

The data lives on your machine, or in a pod controlled by you. This data would be "live" as long as the you like by continually updating encrypted values that are only decrypted using each service's public key. If you want to cut off access to the data, turn off the hose. From there, you'll need to rely on your local government if you require the service to purge existing data, but that's nothing new. I've described in great depth on this website before what such a system might look like. Only public keys and encrypted data are passed around.

Tim Berners-Lee is also tackling this problem with Solid.


We built Stytch's B2B SaaS solution with this specific shortcoming in mind -- most other solutions aren't actually built with an organization-first data model (they're user-first like Auth0 but support the general concept of orgs), which makes it difficult to offer those per organization controls in an ergonomic manner.

There's some more info on our multi-tenancy data model here (https://stytch.com/docs/b2b/guides/multi-tenancy), and here's the PUT request you'd use to manage any of those org configurations: https://stytch.com/docs/b2b/api/update-organization


That looks really close to what I’m after! Will give it a spin.


You can now set up passkeys on your personal gmail, which I've found to be particularly nice for times when I'm trying to log in via webview


Yeah, +1. Even vanilla puppeteer is pretty successful against Cloudflare


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You