Yes because outside Starlink and govt contracts, there isn’t that massive of a demand growth in the sector. There a limit to how many satellites can be in orbit at a time and land based telecom infrastructure makes it so that satellite based infra isn’t necessary unless you’re in remote areas.
I’m not clear on it either. Was the Context.ai OAuth application compromised? So the threat actor essentially had the same visibility into every Context.ai customer’s workspace that Context.ai has? And why is a single employee being blamed? Did this Vercel employee authorize Context.ai to read the whole Vercel workspace?
Next.js renders configuration that’s shared by client and server into a JSON blob in the HTML page. These config variables often come from environment variables. It’s a very common mistake for people to not realize this, and accidentally put what should be a server-only secret into this config. I’ve seen API secrets in HTML source code because of this. The client app doesn’t even use it, but it’s part of the next config so it renders into the page.
IIRC, react had this issue so they required env vars seen in react to be prefixed by REACT_ The hope being that SECRET is not prefixed and so is not available. Of course it requires you to know why they are prefixed and not make REACT_SECRET
They don’t serialize process.env, but devs will take config values from environment variables. Obviously you’re not supposed to do this but it’s a footgun.
yes they innovated with apple sillicon but I would say it only shines in macOS environment. On iOS / iPadOS it's completely untapped - like having ferrari with only gravel roads around.
The level of power in the iPad, and the level of underutilization of that power due to it being handicapped by the OS is mindboggling to me. Although to some extent it makes sense - with Apple owning the whole supply chain it probably wouldn't actually save them much money to make a less powerful chip just to put in it, and they need selling points for the top end models.
And yet it is the best tablet you can buy on the planet top to bottom software and hardware, is it perfect no, what is this phantom alternative to an iPad M4 Pro? Note I already have a desktop computer. I don’t need two of the same thing in short I don’t need Mac OS on two devices.
I find Chutes very intriguing… has anyone used it? I found it when I started wondering what sort of $/performance I could get by simply renting GPU machines by the hour and running my own inference.
Is this going to be an open weights model or not? The post doesn’t make it clear. It seems the weights are not available today, but maybe that’s because it’s in preview?
In principle, yes, that’s the idea! However I will say we have focused mainly on the grammar and using DuckDB for reading from local files for this alpha release, so I expect there may be some bugs around connecting to remote databases still to iron out!
Dunno, I expect if DuckDB works as advertised it might just work! That's the beauty of how they've separated the syntax parsing into frontend/backend from the rest of the engine.
The "Picking delaySeconds" section is quite enlightening.
I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.
> The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:
> - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.
> - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.
> *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.
> For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.
> Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.
> The runtime clamps to [60, 3600], so you don't need to clamp yourself.
Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.
They really should have just set the cache window to 5:30 or some other slightly odd number instead of using all those tokens to tell claude not to pick one of the most common timeout values
This is somewhat obvious if you realize that HTTP is a stateless protocol and Anthropic also needs to re-load the entire context every time a new request arrives.
The part that does get cached - attention KVs - is significantly cheaper.
If you read documentation on this, they (and all other LLM providers) make this fairly clear.
For people who spend a significant amount of time understanding how LLMs and the associated harnesses work, sure. For the majority of people who just want to use it, it's not quite so obvious.
The interface strongly suggests that you're having a running conversation. Tool calls are a non-interactive part of that conversation; the agent is still just crunching away to give you an answer. From the user's perspective, the conversation feels less like stateless HTTP where the next paragraph comes from a random server, and more like a stateful websocket where you're still interacting with the original server that retains your conversation in memory as it's working.
Unloading the conversation after 5 minutes idling can make sense to most users, which is why the current complaints in HN threads tend to align with that 1 hour to 5 minute timeout change. But I suspect a significant amount of what's going on is with people who:
* don't realize that tool calls really add up, especially when context windows are larger.
* had things take more than 5 minutes in a single conversation, such as a large context spinning up subagents that are each doing things that then return a response after 5+ minutes. With the more recent claude code changes, you're conditioned to feel like it's 5 minutes of human idle time for the session. They don't warn you that the same 5 minute rule applies to tool calls, and I'd suspect longer-running delegations to subagents.
Unless I'm parsing your reply very badly, I see no world in which anything dealing with HTTP would be more expensive than dealing with kv cache (loading from "cold" storage, deciding which compute unit to load it into, doing the actual computations for the next call, etc).
No, that’s not the issue. What people fail to understand is that every request - eg every message you send, but also tool call responses - require the entire conversation history to be sent, and the LLM providers need to reprocess things.
The attention part of LLMs (that is, for every token, how much their attention is to all other tokens) is cached in a KV cache.
You can imagine that with large context windows, the overhead becomes enormous (attention has exponential complexity).
Is it in vogue with enterprise devs?
reply