Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.
My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.
There's where even frontier models struggle, which makes comparisons meaningful.
It’s making guesses not decisions, framing as decisions will lead you astray to wasted time and tokens.
It’s vaguely productive to tell them a ton of relevant info upfront attempting to minimise their need for load bearing guesses. I say vaguely because obedience is generally only around the level where it's good enough to lull you into a false sense of security, not to actually be obedient.
It’s a bit more productive to use the various loop mechanisms (hooks, /goal etc) to evaluate each end of turn against guard rails and reject with clear instruction on whats unacceptable. Obviously if you only do this without the front load of info then you’re likely to spend more tokens to reach a satisfactory end of iteration.
While you are correct that something like Antigravity 2 + Opus 4.6 can handle large scale software engineering tasks, I would argue that it is usually (but not always) better "coding agent hygiene" to work on smaller code modules and as the human in the loop be a partner, not someone who prompts and then disengages.
Breaking code up into composable chunks has worked well for me over 50+ years as a professional software developer, and I can't get away from the idea that it is still usually the way to go using agentic coding tools.
> Whenever a central position is formed with power over something, even if it’s only a steering power, it will be sought out by power-hungry people and manipulated
> nobody is sitting their waiting / watching the LLM code anyway
My personal experience is that for production-grade code you need to steer the agent more often than not... so yes, at least some of us are watching the LLM code.
How can you tell from the server if that's a retry (think e.g. some reverse proxy crashed and the first request timed out, but the payment already went through to the user's CC)... or if the user just trying to purchase another item 123 because they forgot they needed 2?
There is simply no way to make the requests idempotent without an idempotency key. The only way to tell both situations apart is to key the requests by some UID. The HTTP verb is irrelevant.
In the case in the article, the request is being rebuilt again by the client, and may be slightly different. Typically, the server doesn't have to care about any of that if it's just "did we get something for this ID?" and either it did and errors (could be a 4xx or a 5xx depending on what it now has), or it didn't, and processes the request.
So what you propose is first you create the request payload and POST it, which generates a request-id-bound URL (but it does nothing stateful yet) and then you actually request to perform it? Because otherwise I don't see any difference.
If you must use POST with a idempotency-key, then my suggestion would be to use it like you'd use a PUT: you generate a guaranteed unique URI on the client, according to a spec agreed upon with the server, along with some idempotency-key value (UUID or whatever per the recommendation of the RFC), and then POST the request to the generated URI. If you get back 200 or 201, great! If you get an error that indicates it might not have worked or you get nothing because of a partition, send the request again. If the server had already processed the first request, the second one should 400 or 409 or something, regardless of any differences between the first and second request. If there was some sort of partial processing, then some other permanent error for that particular ID or URI should convey that.
My original point, though, was that these semantics are well-understood for PUT, so just use PUT, or use POST (with the idempotency-key header) exactly as you would PUT.
Well, that’s a good question. I think the best answer for now is “we’ll see”?
I use it in place of Tailscale for some homelab applications. I’ve started to deploy other experiments on a “prod” cluster. The demo I showed shows how Pollen responds to a multi-step pipeline type application; two WASM seeds and a single egress communicating over the provided RPC mechanism (`pln://seed…` etc) whilst handling routing, back pressure and the like.
Right now, the workloads need to be stateless. I’m coming up with a story for state at the moment, which’ll likely start as some WAL-like convergent structure with thin (KV store etc) abstractions layered over it. Probably not dissimilar from the pattern underpinning the current CRDT gossip state.
Let's see if I got this right: so it's something like a private Yggdrasil Network (minus the IPv6 overlay?) meets self-distributing WASM-powered serverless functions? Plus some built-in functions for proxying/serving.
Ha, at a hand-wavey level, yes? Like you say, there's no IPv6 overlay, each node just exposes it's own primary UDP port which talks Pollen's mesh protocol. It uses a single QUIC transport, one QUIC connection per peer, and a combination of streams and datagrams for different bits serving both the control/data layers.
I'd say "WASM-powered serverless functions" is a reasonable analogy, if your serverless functions maintained a minimal number of live replicas at any one point Also, of course, you're tied to the physical ceiling of the explicit hosts that are underpinning your cluster (N machines which are not dynamic like, say, lambdas are when they auto-provision to match demand).
And yeah, you can also `pln serve` arbitrary services which are then exposed to the cluster, but it's worth mentioning that these will of course not benefit from the inherent, organic autoscaling and locality mechanisms that come with the WASM blobs. I only added it in as a feature so I could retire my (basic) Tailscale usage.
Also, you can `pln seed` arbitrary blobs which can be `pln fetch`ed from other nodes. You can also `pln seed ./public my-site` a static webpage which you can reach from any node with `curl -H "Host: my-site" http://<node-addr>:8080/` (8080 being a configurable port).
> And what's the public API/stdlib/bindings inside the WASM workers?
Wazero (via Extism) carries the load here. As it stands, the runtime lifts three basic host functions into guest code which enable the RPC-like behaviour, injection of caller-context and basic logging[1], which are in turn referenced in the guest code like in the example[2].
In the reverse direction, guest code exposes it's public APIs via build directives[3], which are handled by the runtime code[4].
Figured that concrete examples might be more helpful here (I hope the formatting works).
Failed to mention in my other reply: a "seed" because I envisioned, perhaps too poetically, "seeding" some generic computational unit into the cluster only for it to organically spread to other nodes in the cluster... sort of like pollen? Maybe.
Uh, how? This might be a country thing but you don't have to join any union in my country. You do, if they represent your interests. Big companies have multiple, competing unions, and the anarchists (which refuse state subsidies and are fully self-funded) are pretty good at what they do.
If you have to join a union isn't that essentially a racket?
Of course, and that's why some people are opposed to a union, but then others say you're just falling for propaganda or some such nonsense if you deign to have any, even small, criticism of unions.
> If you have to join a union isn't that essentially a racket?
Yes, this is a big part of my critique.
I'm no lawyer and can't provide a useful explanation of the "why", but literally every educator I know is in the Educator's union. Same with Cops and Nurses. I don't know any airline pilots but I understand it's the same way.
Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.
My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.
There's where even frontier models struggle, which makes comparisons meaningful.