More

gcr · 2026-06-05T11:20:47 1780658447

It certainly can be!

gcr · 2026-06-04T10:56:56 1780570616

For context, Google initially refused to merge JpegXL as a strategy play to promote AVIF, which was in use by other teams (i think Photos?). Internally, chrome engineers were supportive of jxl but were overridden by leadership.

I guess today’s post represents a change.

I don’t have any public evidence to support my claim, sorry. Take it or leave it

ksec · 2026-06-04T14:42:45 1780584165

I think it is quite the opposite. There were some support of JXL but leadership of Google, Android and Chrome all wanted AVIF.

It was a perfect opportunity to announced AVIF with AV2, may be taking the chance to fix issues that JXL wins AV1. But that didn't happen.

gcr · 2026-06-01T01:27:44 1780277264

Between this, Iron Lung, and The Amazing Digital Circus finale getting a cinema release, I think this is shaping up to be a great year for small movie productions

krapp · 2026-06-04T17:04:53 1780592693

The Amazing Digital Circus finale got completely shat on by fans after they leaked the ending and harassed Gooseworx because they didn't like it.

Is it worth being involved in indie production when success just guarantees a hate campaign against you? It will probably happen over Backrooms soon enough.

gcr · 2026-06-04T19:35:52 1780601752

That’s an amazingly pessimistic take, and a shortsighted one. Glitch and the TADC team are more than just one person. Gooseworx is burned out, but her work helped kickstart many early careers among her staff. I think the only people who characterize TADC as a failure are folks who consume too much social media.

Like most projects, some indie productions (like TADC) end in creator burnout, others (like iron lung) end in moderate success, most probably end somewhere in between. We saw this with undertale, lights out, hazbin hotel, primer, el mariachi…

gcr · 2026-05-30T12:13:41 1780143221

Yes but in ways whose solutions admit some level of creativity or ingenuity

gcr · 2026-05-30T10:51:03 1780138263

On the website, their joke has some formatting: “we recommend you upgrade <strikethrough>to play OpenRCT2</strikethrough> for security reasons!”

M95D · 2026-06-01T08:31:55 1780302715

If that is supposed to be sarcasm, I didn't get it.

gcr · 2026-06-02T13:10:54 1780405854

I think it’s throwing shade on Microsoft, which is urging people to upgrade to W11 for “security reasons.”

Since Microsoft owns GitHub, their decision to shut down Windows 7/8 action runners is a sneaky way to help drive customers toward Windows 11.

gcr · 2026-05-28T13:54:30 1779976470

Shouldn't that be part of the test?

Real-world systems need to be able to say "I don't know." This is a test about misinformation after all, and overconfident responses contribute to that.

Teasing out the difference between "avoid" and "unknown" could be a different research question

gcr · 2026-05-28T11:21:43 1779967303

For an alternate albeit somewhat contrarian view, also see Ed Zitron’s piece that add context to Anthropic’s profitability: https://www.wheresyoured.at/anthropics-profitability-swindle...

TL;DR Ed argues that the deal between Anthropic and xAI could have been negotiated in such a way as to make Anthropic only appear profitable during its “ramp-up” period in June, which incidentally is also the month that Anthropic is making tons of other pricing changes.

gcr · 2026-05-28T01:32:00 1779931920

Nifty! What pi extensions provide the batch web fetch?

z3ugma · 2026-05-28T20:42:50 1780000970

I had pi wrote its own and it uses Tavily under the good

gcr · 2026-05-26T13:35:02 1779802502

Speculative decoding shouldn't actually change the accuracy of the response. The draft model drafts a couple tokens, and the inference framework verifies that the larger model would have picked them.

However, I've found that speculative decoders don't help much if you're running a model locally on limited hardware (for instance, my 32GB VRAM M1 Max from 2021). For one, you have to fit both the large and the small drafter model in memory. For another, if you're running a quantized model, the activation distribution is different enough that the draft model has a hard time guessing what's coming next.

My take is that speculative decoding is most useful on _very expensive_ prosumer/hobbyist setups where you have 128GB of VRAM and are running your local models with full fidelity. It's also helpful for inference providers where they can send output tokens at a computational cost slightly higher than their input token cost.

NitpickLawyer · 2026-05-26T14:02:14 1779804134

Your experience might be a bit dated, depending on when was the last time you tried it. MTP (which is a flavor of spec decoding) is showing really solid improvements on local models, even on consumer hardware.

In fact, as the article mentions, you get the biggest gains at low concurrency (so local should apply), with diminishing returns for higher concurrency (if you think in terms of unit of compute, it's probably better to serve more requests in parallel and get more throughput that way).

Eagle3 was great at low context tho, and this seems to improve things at high context. That's really cool, and hopefully it'll turn oout to be useful at those lengths. Eagle3 is also training dependant, so you could try training your own, if your use-cases diverge enough that 3rd party "generalist" models don't suit your needs. (in general nvda, redhat, etc. have provided general eagle3 models for popular families).

samhoss93 · 2026-06-02T04:52:07 1780375927

Agree. At high concurrency, you are better off spending the compute budget on parallel requests rather than draft prediction. The challenging part is that most deployment don't have static traffic profiles. A configuration that was right at launch may no longer be correct months later, and there is no signal that tells you when you have crossed the threshold.

tssge · 2026-05-26T17:17:35 1779815855

The reason speculative decoding shows diminishing returns in batched workloads is because the principle of both is the same.

Speculative decoding predicts a group of tokens and verifies this group using the main model in one pass instead of decoding each token separately. Eg. for this group, the weights are loaded from RAM per group instead of per token: roughly the same computation is performed but not the same memory movement (and other overhead like kernel launches).

Batching utilizes the same mechanism, so speculative decoding is essentially an attempt to batch a single stream using prediction. An attempt, because the verification may reject some tokens if the prediction was inaccurate.

gcr · 2026-05-26T21:44:56 1779831896

Thanks, appreciate the info. For whatever it’s worth regarding recency, I’m testing the main llama-cpp branch that was pulled and built on 2026-05-25 running unsloth/Qwen3.6-35B-A3B-MTP-GGUF:Q4_K_M, my hardware platform is M1 Max 32GB VRAM. Is there a different fork or quant I should be using?

gcr · 2026-05-26T11:22:35 1779794555

Arguing with the agent is an anti-pattern IMO.

When Claude acts up, my strategy is to rewind the conversation to the point where the misunderstanding started, revise what I said just before, and then continue from there. I’ve found that letting one mistake enter the conversation seems to make further mistakes more likely.

The user already has omnipotent power over the agent’s sense of time and memory. We can rewrite what Claude sees and hears. Can’t do that with humans, but rewinding such an important function in Claude that it has a top-level keystroke.

Why spend time, tokens, and cortisol arguing and demanding the pet rock step through an apology protocol?

HN For You