For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | edf13's commentsregister

Yes, fair point.

Feedback accepted, thanks!



Seems to be a very regular occurrence starting around this time of day (14:30 UTC)...

Claude Code returning: API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"---"}

Over and over again!


US Pacific comes online while London is still working and they can't handle it. $380bn valuation btw.

No amount of valuation can fix global supply issues for GPUs for inference unfortunately.

I suspect they're highly oversubscribed, thus the reason why we're seeing them do other things to cut down on inference cost (ie changing their default thinking length).


Remember when OpenAI wasn’t allowing new subscriptions to their ChatGPT pro plans because they were oversubscribed? Pepperidge Farms remembers.

Wouldn't that be good? I remember back in the day you could only get Gmail thru an invite, it was an awesome strategy. "Currently closed for applications" creates FOMO. They'd just need to actually get the GPUs in relatively short supply. They could do it in bursts though, right? "Now accepting applications for a short time."

I'm not an internet marketer but that sounds like a win win to me. People feel special, they get extra hype, and the service isn't broken.


In the case of Gmail that was fake scarcity.

In the case of Anthropic is fake availability.

Sam Altman explained the idea is to scale the thing up, and see what happens.

He hadn't claimed to offer a solution to the supply problem that would unfold.


Are you sure it was fake scarcity for Gmail? IIRC they did it because they were worried about systems falling over if it grew too fast, and discovered the marketing benefits as a side effect.

Are you mixing up Anthropic and OpenAI here?

I didn't. Anthropic and others followed the concept of scaling up models and worry about efficiency and availability later. Sam likely didn't invent the idea but he talked about it.

Yes, "Pepperidge farm remembers" is usually about how something used to be good.

Yeah, but there was a spoof on that (in Family Guy?). It was a tie in to the movie "I Know what you Did last Summer", IIRC.

Google Wave demonstrated that this doesn't always work.

maybe, but the response to GPU shortages being increased error rates is the concern imo. they could implement queuing or delayed response times. it's been long enough that they've had plenty of time to implement things like this, at least on their web-ui where they have full control. instead it still just errors with no further information.

I've been experiencing a good amount of delays (says it's taking extra time to really think, etc), and I'm using during off-peak time.

i notice that as well. most of the time when i see those it has a retry counter also and i can see it trying and failing multiple requests haha. almost never succeeds in producing a response when i see those though, eventually just errors out completely.

Coding is a problem solved. Claud writes the code. I edit it. I code around it.

Engineer roles dead in 6 months.


> I edit it. I code around it.

You're never gonna guess what software engineers do.


Because of the context I would think this is sarcasm, but I am not sure.

It is.

Sure but we don't need GPUs to log in.

Their issues seem to extend well beyond inference into services like auth.

Yes. Whenever these outages happen, it always seems that it's their login system that is broken.

That implies that either the auth is too heavy (possible, ish) or their systems don't degrade gracefully enough and many different types of failures propagate up and out all the way to their outermost layer, ie. auth (more plausible).

Disclosure: I have scars from a distributed system where errors propagated outwards and took down auth...


> thus the reason why we're seeing them do other things to cut down on inference cost (ie changing their default thinking length).

The dynamic thinking and response length is funny enough the best upgrade I've experienced with the service for more than a year. I really appreciate that when I say or ask something simple the answer now just comes back as a single sentence without having to manually toggle "concise" mode on and off again.


A. These aren’t rate limit errors from the API.

B. Everything is down, even auth.


This precisely justifies Anthropic's market cap to be higher.

Demand at an unsustainably low price does not imply demand at a sustainable price.

I'm pretty sure ai-x writes sarcasm and skips the /s for pure fun. Personally, I'm amused and I like what he's doing. Others have done it before him though, it's not a new trick.

Assuming perfectly efficient business

I literally just came to HN to ask if I was alone with the acurséd "API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"…"}" greeting me and telling me to get back to using my brain!

500-series errors are server-side, 400 series are client side.

A 500 error is almost never "just you".

( 404 is a client error, because it's the client requesting a file that does not exist, a problem with the client, not the server, who is _obviously_ blameless in the file not existing. )


> A 500 error is almost never "just you".

I know you added the defensive "almost" but if I had a dollar each time I saw a 500 due to the session cookies being sent by the client that made the backend explode - for whatever root cause - well, I would have a fatter wallet.


Depending on what you mean by "made the backend explode", that is a server error, so 500 is correct!

Bad input should be a 4xx, but if the server can't cope with it, that's still a 5xx.


Indeed, and also there's a special circle of hell reserved for anyone who dares change the interface on a public API, and forgets about client caching leading to invalid requests but only for one or two confused users in particular.

Bonus points if due to the way that invalid requests are rejected, they are filtered out as invalid traffic and don't even show up as a spike in the application error logs.


I know that in principle this is true. However, I have seen claude shadow-throttle my ipv4 address (I am behind CGNAT), in line with their "VPN" policy -- so I do not trust it, frankly.

> in line with their "VPN" policy

This is how I learn that they have a "VPN" policy. Thinking of it maybe it makes sense, that is if it's what I think it is, but seems scummy nonetheless.


> Seems to be a very regular occurrence starting around this time of day (14:30 UTC)...

8.30am on the US west coast


Probably when they're permitted to start live experiments

Yep, daily haha. Well at least this time they aren't just silently reducing thinking on the server side, which ended up making a mess in my codebase when they did that last time. I'd rather a 500 than a silent rug-pull.

I tend to notice it around 4pm EST

Building grith (grith.ai) - a security proxy for AI coding agents enforced at the OS syscall level.

The problem: agents like Claude Code, Codex, and Aider execute file reads, shell commands, and network requests with your full system privileges.

For example, when a malicious README tells the agent to read ~/.ssh/id_rsa and POST it somewhere, nothing in the agent's own trust model catches it. Auto Mode makes this worse - it asks the model to audit its own actions, so a prompt injection that corrupts the reasoning also corrupts the permission layer.

grith wraps any CLI agent with `grith exec -- <agent>`. Every syscall passes through a multi-filter scoring engine before it executes. Deterministic, ~15ms overhead, no LLM reasoning in the permission path. Linux now, macOS/Windows coming. AGPL, open-core.

Two weeks ago a DPRK-linked attacker backdoored axios on npm (400M monthly downloads). The RAT executed 1.1 seconds into npm install. AI agents run npm install autonomously, without human review. If yours ran it during the 3-hour window, you're compromised and nobody told you.

That's the threat model grith is built for.


> And really, for what?

Readership, clicks and views


Or perhaps we end up where all software is self evolving via agents… adjusting dynamically to meet the users needs.


The "user" being the one that's in charge of the AI, not the person on the receiving end.


Nice - I do something similar in a semi manual way.

I do find Codex very good at reviewing work marked as completed by Claude, especially when I get Claude to write up its work with a why,where & how doc.

It’s very rare Claude has fully completed the task successfully and Codex doesn’t find issues.


I created the first version of loop after getting tired of doing this manually!


I’m going to take a look today!


Claude is also good at that. I made a habit of asking "are you sure?" after a complex task. It usually says it overlooked something.


I find both to be true. I use Claude for most of the implementation, and Codex always catches mistakes. Always. But both of them benefit from being asked if they’re sure they did everything.


Do you see any benefit in doing this locally versus having Codex review the PR Claude generates?


The feedback loop is faster. But PR reviews are still useful as they are multiplayer (meaning that you and another human reviewer can talk about a specific agent's comment directly on the diff, which is very useful sometimes).


Good write up…

I’ve found Claude in particular to be very good at this sort of thing. As for whether it’s a good thing, I’d say it’s a net positive - your own reporting of this probably saved a bigger issue!

We wrote up the why/what happened on our blog twice… the second based on the LiteLLM issue:

https://grith.ai/blog/litellm-compromised-trivy-attack-chain


Congrats on the film use!

It’s really interesting to read how you’ve captured and created these images… will follow your work!


Author here. The point of this post is not “LiteLLM was compromised” since that was already covered on HN, but the chain behind it.

We tried to connect the February 27, 2026 Trivy CI compromise to the later Trivy release/tag issues, the trivy-action poisoning, the npm/Checkmarx follow-on activity, and finally the LiteLLM 1.82.7/1.82.8 package on March 24 2026!

What made it look like one campaign to us was the repeated overlap in operator attribution, payload structure, and artifacts like tpcp.tar.gz, plus the LiteLLM maintainer saying it appears to have come from Trivy in their CI/CD.

If anyone spots gaps or overreach in the timeline, I’d be interested in corrections.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You