More

taeric · 2026-06-05T18:49:19 1780685359

An odd example, as fireflies are still pretty big in the places they have always been, aren't they? I know when I get to visit my childhood states, they are still there. Similar for cicadas and other bugs of my youth that I didn't realize were far more local than I expected.

MSFT_Edging · 2026-06-05T18:56:03 1780685763

It was just a recently notable example. Even as of 2-3 years ago I used to see them a decent amount. They're a highly visible marker of an insect population that is dropping like a rock.

They're also a beautiful creature that I could imagine wishing a child of mine could experience the same way I did, which better illustrates the tragedy of the damage we're doing to the planet.

taeric · 2026-06-05T19:41:47 1780688507

I'm assuming you still live in the same place? My understanding the last time I took a dive on this is that the numbers are going down, but not in any way that is going to see them gone. You will need to go to where they are, though. And, alas, the PNW is not a place to find them.

taeric · 2026-06-05T18:06:24 1780682784

I confess a sad assumption that bot traffic is far higher than we have admitted for a long time. Though, maybe we would see different stats specifically to social media sights to astroturf like counts? Certainly feels that we have known for a long time that bots were larger in ad viewing than ad companies wanted to admit.

reconnecting · 2026-06-05T18:48:56 1780685336

I don't understand what difference bots make. For me, a website (the public part) is a storefront. People walk down the street and see what's inside — that's the purpose. If something should not be available immediately, that's the private part of the store.

I've been monitoring bot traffic on digital platforms for over 10 years. Sure, the crawler share is growing, some even with malicious intentions, and those I detect and block.

I disagree that this pain is worth the cost of making real people spend their life on verification.

taeric · 2026-06-05T18:55:57 1780685757

For ad views, the concern is specifically that people pay for clicks and views. That that can be so heavily influenced by bot traffic greatly undermines their value.

Same general idea goes for any of the algorithmic driven platforms. The algorithms are ostensibly intended to surface organically discovered things by watching how people interact with things. That they are so susceptible to distortion through bot farms should be a lot more acknowledged than it is. People trust them far more than they should.

There is also a general cost of running things concern. It isn't like it is completely free to execute on bot traffic.

reconnecting · 2026-06-05T19:11:33 1780686693

For ads, I believe this must be a problem for ad platform owners.

If the digital platform's storefront is their business, they could afford to spend some budget on bot detection. Bots still come from data center networks, sometimes render pages incompletely, request resources in bulk, and show enough patterns to be flagged internally.

If we look at a medium website, most random crawlers will come from Amazon, Microsoft, DigitalOcean, Hetzner, OVH, and a few other DC networks — these can be blocked easily without harming real users. The rest can be detected and cleaned up, even manually.

The math is simple: 20,000 visits a day at 15 seconds each = ~83 hours a day lost watching a Cloudflare logo, just because someone doesn't want to dig into the logs. I don't buy it.

taeric · 2026-06-05T19:39:54 1780688394

Largely agreed, though I think you are likely underestimating how hard this is to detect. In particular, it is true that many bots can be hosted in data centers, but it is somewhat trivial to launder that traffic through other sources. Malware, in particular, is what I have in mind. Maybe I'm wrong and that has largely gone away?

There is also a bit of mixed incentives. Yes, it is the ad platform that is getting abused. But it is also the ad platform that is charging people based on abused practices.

And it isn't like this is completely made up. Just look at how facebook killed a lot of ton of people during the "pivot to video" programs. I don't know all of the details, as I was thankfully not in any of the involved industries, but my understanding is it is fairly well documented.

Edit: I changed an "isn't" to "is." I think I was trying to reword at one point, but left it in a way that is opposite what I meant.

Groxx · 2026-06-05T20:36:51 1780691811

For efficiently-hosted sites with little media it's not too bad. E.g. hosting a static site just doesn't cost much, even if you're hammered occasionally.

That's extremely far from all sites though. It's probably safe to say it's a severe minority, particularly when you ignore personal / non-profit-bringing sites. Tons of small and large sites run stuff like poorly-written wordpress or ruby on rails or thousands of microservices doing god knows what. A major increase in request volume on those can easily mean significant increases in hosting charges (e.g. small-% on big, many multiples on small) or significant effort in optimizing (which is expensive too).

reconnecting · 2026-06-05T21:43:31 1780695811

The website I mentioned has over 15k webpages and ~200 GB of media, and yet we monitor bots manually and only block them if they're pulling 5k requests in a row. Malicious URLs, multiply 404 are blocked by default. HEAD request rejected.

Even on a very bad day, the server's page load time doesn't go over 1s.

However, it seems like I'm indeed looking at the problem through the wrong prism, as what I've seen from the comments suggests that the initial issue is performance, and the bots are what uncover it.

Groxx · 2026-06-05T21:53:23 1780696403

I think a good chunk of it is bot-induced performance problems, yea. Whether that's compute or transfer. And advertisement costs.

Optimization is very very much not a solved problem though, just look at basically all software ever written - it's written for an optimization priority and to a price point (whether commercial $$ or via personal time), and that target's value to its users has shifted rather dramatically.

reconnecting · 2026-06-05T22:08:32 1780697312

This is really interesting. I indeed looked at this problem from the wrong perspective.

I'm working on an open-source tool that could be useful for bot detection, but I'm still not confident that anyone would deploy it on-prem and make the setup/maintenance instead of just routing traffic through the cloud.

Perhaps performance as a KPI could work. Thanks!

Groxx · 2026-06-05T22:16:16 1780697776

I think you'd definitely find some interest, e.g. anyone that intentionally avoids "the cloud" will want something local. Honestly I assume there are some of these already, monitoring apache/nginx/etc logs. Anubis is arguably similar and has been exploding lately, for example, though I'm not sure if it auto-updates its rules at all: https://github.com/TecharoHQ/anubis

As to if it'd get enough interest: yea no idea at all. I wish you luck tho! Clearly there's a need for this kind of thing.

reconnecting · 2026-06-06T07:30:03 1780731003

Our team develops a risk-based analytics system that we also use for bot detection. From our perspective, bots shouldn't be blindly blocked, but rather properly monitored and blocked only when necessary. Here is a live demo (1) to give you a general idea.

1. https://play.tirreno.com (admin/tirreno)

LorenPechtel · 2026-06-05T19:31:46 1780687906

When most of your server capacity is going to answering the scrapers it matters. It's not that the stuff is hidden, it's that storefront being flooded with 10x as many customers as the fire code allows. And some of them go around asking your employees mindless questions. (Small forum I help moderate: we were getting hammered with what was probably some sort of AI that was taking search queries and feeding them into the forum search. Search is now registered users only.)

reconnecting · 2026-06-05T21:36:09 1780695369

> When most of your server capacity is going to answering the scrapers it matters

I've been dealing with the web since the previous century and still haven't managed to build a website that could be hurt by scrapers visiting it.

If you went through the logs, you'd probably see that these bots are on a single IP or subnet, which can be easily detected and blocked instead of closing off search to non-registered users.

mikey_p · 2026-06-05T18:47:59 1780685279

Well the fun things is that no one knows how much traffic of what kind they are getting when they use Cloudflare.

You get the numbers that Cloudflare tells you, but who knows if you can trust their stats after their CEO is apparently cherry-picking data to shape their product narrative?

thewebguyd · 2026-06-05T19:25:17 1780687517

That same CEO too that just went on a wild tone-def layoff justification, classifying human employees into roles of either a builder, seller, or measurer and saying he wants to get rid of everyone that "measures" the business...

I wouldn't trust a single thing coming out of his mouth.

taeric · 2026-06-05T14:40:36 1780670436

If it helps, I have found the attitude that writing is mostly for the writer to be healthy in continuing the practice. And it largely tracks with how I feel I have had better understanding of things I have documented than those I have not.

taeric · 2026-06-05T14:38:59 1780670339

To be fair, it isn't that different from why we have imaginary numbers. Or why the reals are calls reals.

Which. Yeah, has been a pretty bad thing for people in understanding those. :(

taeric · 2026-06-04T17:14:15 1780593255

I confess I laughed harder at the Grok comment than I wish I had. Sad to remember that some strawmen are given life and promoted by people. Actively.

whatshisface · 2026-06-04T17:25:34 1780593934

I had a good laugh when Haiku's thinking summarization referred to mayor Mamdani as a, quote, "known anti-Zionist." :-) Probably a good thing to remember is that the value added in RLHF is not partly biased, or biased, but itself bias.

(Context: I asked it to write fake Reddit comments, because I was curious about how realistic they could be. The colorful phrase occurred during its reasoning about the requested subjects.)

baggy_trough · 2026-06-04T18:55:54 1780599354

Is there something strange or funny about that?

whatshisface · 2026-06-04T21:26:19 1780608379

In English, the word "known" is generally placed in sentences like, "known sympathizer," more often than in "known Democrat." Compare, "suspected," contrast the more neutral, "is an."

taeric · 2026-06-04T16:52:52 1780591972

More than not being entirely sure what the impact is, I don't see any suggestion at what to do about it?

thisisthenewme · 2026-06-04T17:06:16 1780592776

When a researcher discovers that smoking is damaging to the lungs, do they need to provide a solution that allows people to smoke without damaging their lungs? Would their inability to provide a solution take anything away from the research?

taeric · 2026-06-04T17:12:46 1780593166

To conflate AI with smoking is just not helpful. At all.

Or are you saying that there are acute harms from AI that are being ignored?

PaulDavisThe1st · 2026-06-04T17:16:07 1780593367

Acute, chronic - why would it matter?

Why is it unhelpful to conflate AI with smoking?

And yes, lots of people are saying "there are harms from AI that are being ignored".

taeric · 2026-06-04T17:26:50 1780594010

Acute would imply that we should flat out stop. Chronic would imply looking for plans to work on it. Acute and chronic would imply that we should both stop and take action to address damages.

What harms from AI are people ignoring?

camphy · 2026-06-04T17:13:21 1780593201

If you’re referring to a solution to large datasets without not being auditable, she actually did provide a solution. Something to do with data sheets for these training data sets similar to those provided for hardware components. At least, if my memory serves me.

taeric · 2026-06-04T17:18:57 1780593537

I was more irked by the diversity of teams developing these concern. Which, feels like a benign enough concern, but not one where you can just stop progress.

Worse, I think it is a ridiculously safe bet that the US was home to the most diverse teams you could get for this sort of work. Asking the good faith participants to stop participating would have decreased the stated goal.

wesleywt · 2026-06-04T16:58:27 1780592307

Why should the person identifying the problem provide a solution? This doesn't make sense.

taeric · 2026-06-04T17:11:09 1780593069

If the criticism can't distill up from "bad things could happen", it just isn't useful to keep paying people to come up with that kind of critique.

And it isn't like we stopped paying attention to these concerns, is it? Nor were they completely blind siding us at the time. The question was largely of what to do about them.

PaulDavisThe1st · 2026-06-04T17:15:22 1780593322

The question also whether large-scale utilization of LLMs (and also the prerequisite increased training processes) should proceed before these issues were addressed. Clearly, we collectively answered "yes" without any actual reasoning (and arguably, without any collective decision making either).

taeric · 2026-06-04T17:25:19 1780593919

This feels incoherent. I'm game to agree that there were and are poor decisions being made. But are you proposing that we could have stopped all progress until these vague concerns were addressed?

For some of the concerns, like language understanding, I can't bring myself to think that many of the experts out there were doing any better than these models can do today. Quite the contrary.

And do you think that that would not have been counter to the concern over diversity of teams working on it?

Or concerns over bias going away by having the US attempt to abstain? Good luck with that. It sucks, but China and Russia should stand as stark examples that it turns out you can take strong control over the internet.

Enginerrrd · 2026-06-04T17:38:38 1780594718

It’s pretty common in the security world to have a red team and a blue team. There is overlap in the skillset for both, but there are good reasons to have separate people develop each team, and we wouldn’t expect people to have a talent for both.

Ideally, we like it if the red team can suggest solutions, but that’s not always their job or expertise and I’ve rarely if ever heard someone express the sentiment you are within that context by suggesting a really good red team person isn’t useful if they can’t fix the holes they find.

taeric · 2026-06-04T17:42:43 1780594963

Right, but if one of my teams, red or blue, was just saying "the other teams could be flawed", I would probably push for a new makeup for that team?

tptacek · 2026-06-04T17:42:46 1780594966

This is true but it's worth pointing out that the currency of red teaming is the POC, and the authors of the Stochastic Parrots paper don't have one.

taeric · 2026-06-03T14:58:23 1780498703

This is borderline silly, though. It is clumsy to start. But so is walking. As is running. Have you seen people start out on bicycles? What about writing? Talking?

That is to say, all things start out clumsy. And people that are good at it, no longer feel that it is clumsy. Which is why a lot of people that have been working with this for any time just don't think of this much.

Sharlin · 2026-06-03T16:46:51 1780505211

Such a strange attitude.

If a tool is clumsy, we try to improve it, that has been the case since the first stone artifacts created a million years ago.

Do you think that the (sort of) tree-based affordances that most modern code editors do support, like autoindentation and brace pairing/enclosing, are silly too? What about some slightly more advanced features, like the AST-based "extend selection" and "move statement up/down" features in JetBrains IDEs?

Or do you think that the status quo just somehow happens to be exactly right and going any further would be silly?

taeric · 2026-06-03T17:05:58 1780506358

It would be silly strictly for how strongly worded it is. I should also say that there is nothing wrong with being silly. Someone may actually come up with something some day that meaningfully changes us here.

That is, I am not disagreeing that it can be a little bit clunky. But, a lot of the power that experienced users have in reading code is specifically that they have built a bit of automaticity in reading it. That is, the clunky aspects of fixing it is something you pretty much have to do. You just build speed at automatically doing it rapidly.

So, the status quo is to use the helper functions that you want to use. But usually after you get the experience in the clunky phase.

taeric · 2026-06-02T19:02:01 1780426921

Holy crap is that an amusing/depressing video. Assuming the financial shenanigans outlined in it are even partially accurate, how the heck is this getting allowed?

Eisenstein · 2026-06-02T19:09:43 1780427383

It's allowed because the people in charge are making a ton of money and the people who aren't have been convinced that regulating the market is bad.

kibwen · 2026-06-02T19:05:20 1780427120

> how the heck is this getting allowed

When it comes to dealing with the abuse of power by those who hold power, the question is not "who's allowing them to do this?", it's "who's going to stop them?".

taeric · 2026-06-01T14:00:26 1780322426

Adding to this, I highly recommend looking up images of wet owls. They look hilarious.

KineticLensman · 2026-06-01T15:48:55 1780328935

Yes. Especially for owls like the Great Grey, it shows how small the bird is inside all of those feathers.

taeric · 2026-05-26T21:35:42 1779831342

I'm torn. On the one hand, this is not too uncommon of a problem to run into. On the other, poor practices from coworkers are not going to go away thanks to a language filter.

So, the question will come down to which causes more grief, people abusing this convention, or people that overly use the language features that combat it? It is the standard optimization question between poor practices and enforcement that you have in any question of enforcement.

I would be delighted if we could get some empirical data on this.

HN For You