Speaking from the scraper’s perspective, I like proof of work; a ten year old 96-core server will cost a couple of quid to run for a few hours and will grab an absurd number of pages thanks to the access granted by repeatedly solving proofs of work. Small slick codebases too!
There's also the Anubis idea where your PoW is persistent until your IP address or session cookie changes, so you get to skip PoW in exchange for making yourself identifiable, which means the PoW can then be ramped up to take a couple of minutes.
I don't use Anubis though. I just make my site not take five seconds to render a page so bots can overload it easily? It's not actually that hard?
Exactly. I’m constantly amazed at how little you actually need to bypass CF, Amazon, Azure WAFs and so on (Incapsula springs to mind too). When you look at the code you’ve come up with, it’s actually quite small and compact.
More to the point, these systems actually help scraping because proof of work unlocks essentially unlimited scraping, in my experience.
That said - from my experience on the other side, sure you can’t stop people like me or you, but you can stop 99% of the others. That’s more than worth it operationally.
I suspect that introducing the calibration concept might be a case of too much too soon for some people.
As far as I understand it, the various probability matrices boil down to: what token has the highest likelihood of coming next, given this set of input tokens. Which then all gets chucked away and rebuilt when the most likely token is appended to the input set.
Objective assessment of internal state - again, to my non-expert eye - doesn’t appear to have any way to surface to me.
Big-if my rough working understand is more or less correct - your calibration point makes a lot of sense to me. I’m not sure that it would make sense to someone who eg considers some form of active thinking process that is intellectualising about whether to output this or that token.
The irony for me being that when I was first learning Polish and looking for any and all mnemonics - “ah, that word is the number nine, and that one is ten because it has an s in the middle and that’s next to t for ten in the alphabet”-levels of desperate - the false etymology helped me set word, słowo, in my head, and the rather delightful dosłownie, literally / to the word, has remained ever since.
(tho while on the subject, it’s hard to beat wieloryb as a wonder that I don’t want to know the true etymology of ever because if there’s even a chance that the word for whale derived from the words great as-in-size + fish, I want to hang on to it forever)
If you can find a way to combine this with local population to end up at pence per litre per thousand population, I bet you’d uncover some fun trends. Bet it’d also get interesting if combined with population within an X min drive too.
Tho really need some car population per road segment stats to drive the most out of it IMO.
Hah TIL. So it's the river Welsh river on the English side of the Bristol channel.
I often feel like I would understand a lot more names if I bothered learning Welsh. It's pretty popular for made up climbing route names too, because Wales is so good for it I guess. Allegedly some of the classics in the Avon gorge are Welsh derived but I could never figure them out to be sure.
It’s lovely isn’t it? There’re a good few of these things around: notably Torpenhow Hill (which killjoys dispute); and ones like Pendle Hill (which they don’t).
Makes sense given Welsh’s evolution from Britannic. Much to my shame, I only started visiting Wales in later life, and there’s really something in the language that grabs me quite deeply. Once I’ve got my Polish down to pat, I tell myself.
> With Maplibre or any modern map SDK this this is standard…
In practise, this doesn’t work out as visually pleasing as you’d like; labels repeat, or render partially or not at all, or become interfered with by other labels, or only work well at one given zoom. It’s easy to end up in a visually dissatisfying place that’s taking an unfathomable number of magic rules to get to.
The secret sauce to fixing this is creating separate label layers of perfect point locations or lines for labels to follow in advance. Added bonus is faster render and interaction times due to fewer rules.
For what it’s worth, I cancelled my ChatGPT subscription, and every time I try debugging a Linux system issue, I feel sad that Claude is sooooooo confidently bad at it.
Claude is noticeably poor for my use case on this particular issue. That said, I imagine I’m not alone in refusing to continue paying OpenAI. We’re in for a wild ride.
A lot of folks here will be startup types though, and while there is the idea that you'll make it big, I think day to day people work at startups for the satisfaction.
Completely by accident, I have a setup that sends a pdf invoice to customers a couple of days after the sale. I’m pretty sure it’s a stripe option I must’ve misclicked.
Anyway- turns out that on the rare occasion someone’s had an issue, this gives them a really easy mechanism to write to me and tell me about it. They let off their steam in the email and then we make things good together. (Yet another reason why I always oppose noreply email addresses)
I still don’t know what or where the setting is, mind.
That's a great idea, thanks! I've found and enabled a few emails, though I think the actual invoice email is a checkout parameter. This should help, thanks!
reply