For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | Terretta's commentsregister

Two new security features for uv:

- uv audit is a new command that scans your dependencies for known vulnerabilities and "adverse" project statuses (such as being deprecated)

- uv add, uv sync, etc. can now perform a lightweight OSV-based lookup for previously-resolved malware on every sync operation, try it by setting UV_MALWARE_CHECK=1

Both are in preview, considered unstable, and there may be breaking changes…

• • •

Meanwhile, don't forget uv's exclude-newer cooldown: https://docs.astral.sh/uv/reference/settings/#exclude-newer

  # pyproject.toml
  [tool.uv]
  exclude-newer = "P3D"  # "3 days" in RFC 3339 format
Or use it with uv pip compile to generate pinned requirements with cooldown:

  $ uv pip compile --exclude-newer "3 days" requirements.in -o requirements.txt

Destroying competition by removing the consumer choice for vertical integration in service of strong security, privacy, reliability, etc.... is mistaken.

It's competing at the wrong level.

The iPhone is a toaster. Nobody's up in arms about whether the toaster takes other manufacturer's crumb tray. It's a television, and nobody's demanding QLED and OLED be swappable. It's a console. Xbox doesn't play PS5 games. It's fine.

There's no real line between hardware / firmware / software / malware ... For what Apple offers consumers, every layer of whateverware should be trusted.

Drawing imaginary lines based on the embodiment or substrates for logic gates is mistaken.

There are lots of phones. Lot's of different philosophies. Stop taking away consumer right to pick a philosophy and design for an end to end experience. It's fine.


Nothing about allowing others equal access to the OS means that someone can’t still choose Apple’s first party services and products.

It’s not an either/or thing, it’s about preventing so called gatekeepers from anticompetitive behavior via favoring their own accessories and services while simultaneously preventing any others from possibly competing.

There’s no valid reason at all a third party smartwatch shouldn’t be able to integrate to the same level as an Apple Watch. No reason third party Bluetooth earbuds shouldn’t be able use ADWL for automatic device switching, etc.

Want to still use only Apple? Great, nothing says you can’t. But at least it would be user choice and there would be actually competition which would lead to better products for all.

Can’t believe I lived to see the day that people on HN start defending vendor lock in and closed platforms as a good thing. Have all the hackers retired?


> Want to still use only Apple? Great, nothing says you can’t. But at least it would be user choice...

It's already user choice. The problem is too many users like the lineup. And too many who aren't going to use it, don't.


VisionPro was/is a dev platform, priced to ensure it wasn't yet a mainstream device.

99% of "apps" for it were confused garbage, totally misunderstanding what it's for or how to use it.

The percentage of apps that "get it" is rising. Not sure if the disillusioned left, or if more are figuring it out.

Either way, when Apple releases something consumer facing, or for consumers' faces, this means there's a prayer of being more than a deluge of Oculus content.

Or at least I'd like to imagine that's what they're doing. :-)


> What's the difference now?

Well, which ones are on my Mac locally?

Which ones are in my iPhone locally?


On the plus side in MacOS 27, once it is playing, dragging the playhead makes it grumpy and to fix it you get to quit or go full screen:

- Media Playback Known Issues: In apps like TV, Podcasts, and Music, the window controls may become unresponsive after dragging the playhead to adjust the playback position. (177984877)

- Workaround: Use keyboard shortcuts or the menu bar to close, minimize, or enter full screen mode.

• • •

Super clunky compared to the imminently more practical workaround for wrong-size gifs in Messages, STOP LOOKING AT IT:

- Messages Known Issues: GIFs and pasted images might render as the incorrect size. (177657977)

- Workaround: Scroll until that message is offscreen


> the estimates

It doesn't estimate.

It generates tokens that read like estimates associated with the context in its training material.

What would you expect the generator to output instead?


It generates tokens by estimating what the next token is going to be.

Sure it cannot think like a human, but given it's input, it should give a good statistical answer (approximating not of how long it actually takes, but what a human would say how long it takes).


The funny thing about this comment is that neural networks are universal function approximators.

The most fundamental essence of what they do is exactly what you say they don't: estimate.


Funny and ironic in a way, but the point still stands that they do not actually estimate the time it will take.

> they do not actually estimate the time it will take

You can't prove that )))


Right, but extraordinary claims require...

Obviously there isn't a hidden corpus of logs of coding chatbot assistants that has been accumulating over the years, but these coding chatbot assistants output tokens that resemble how we all imagined a coding chatbot assistant would have operated had it existed in the first place to end up in a corpus. "Training material" includes supervised fine-tuning, preference training, RLHF, and so on, so that certain outputs (like these timeline estimates) may really have been decided (at some level of conscious awareness) by product teams.

you might like the stuff in my work of oh my pi, its a test bed for my ideas around making these tools more reliable. hoping to maybe have a native ui iter of the real thing that this is a test bed for this summer.

https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b...


Therein lies the rub, no? To accurately predict the next token produced by a process, it’s necessary to model that process. If the process is a human attempting to estimate the duration of a task, then in some sense the LLM is modeling the estimation process. We’re well past the point where it’s credible to claim that LLMs just regurgitate their training data.

This is so 2023. The thought process.

At that time the predominant view was that LLMs were nothing but stochastic parrots, that they would plateau, and that hallucinations couldn't be fixed.

At this point I doubt there are any AI sceptics left. That ship has long sailed. The only thing that matters is whether the estimates are accurate, and AI can improve on that too.

Even humans only estimate based on neurons firing in prior patterns.


I think people are continuing to view these systems as pure LLMs - when that ship sailed 6+ months ago. Between being able to review memory, using agent harnesses and sub agents and skills to go out and discover information - modern systems (Codex, Claude Code, Cursor) - use LLMs - but the LLM is only a small component of it. Compare what you get from sending a request to a chatbot like ChatGPT - to what you can from a modern harness. The output is influenced by the LLM, but it's no longer a "model making a token prediction based on training material and RLHF" - that's a very 2025 way of looking at these systems.

Even Gary Marcus is starting to come around and realize that his priors are no longer as relevant as they once were.


No one is bitter lesson pilled anymore. Everyone is pivoting to neurosymbolic systems. It looks like Gary Marcus was right.

> No one is bitter lesson pilled anymore.

Will the 10T parameter Mythos model be released this month or next month?

They better soon because it is generally accepted that one of the reasons GPT 5.5 is better at hard tasks than Opus is because of its parameter size - and that Opus 4.8 remains competitive only be scaling test-time compute (see how many more tokens it uses than GPT 5.5)

https://www.reddit.com/r/LLM/comments/1sz8bjz/parameter_esti...


Why ask me? Anyway, Mythos is not 10T. Anthropic confirmed the training run was under 10^26 flops. You can't train 10T to chincilla and stay under 10^26.

Anthropic also confirmed they will not release Mythos, only a "Mythos-class" model, whatever that means.


How is neurosymbolic not aligned with the bitter lesson? The bitter lesson is completely agnostic to architecture.

I should have stressed the symbolic part. Everyone has pivoted to symbolic systems like claude code and codex. They would no invest so heavily in such systems if they thought llms would deliver agi soon.

You think someone is, or even should, special case things like estimates? What else deserves that level of intervention so they look less dumb?

Logistics for getting to the car wash next door?

In the mean time, alas, no, we can see from actual prompts sent directly or through sub-agents, and actual replies, estimates remain LLM generated.

Though, this discussion here could change that, because indeed there is a lot of special casing and context stuffing going on, one of the oldest being today's date for example.

• • •

I did read the Claude Code leak, and use pi, etc. So I disagree with your premise rather strongly. Today's "systems" remain, roughly, piles of markdown and context engineering wrapped in UI affordances, and behave very similarly today to how they did in 2024 for those already engineering context and delegating.


I do a lot of code bisecting with Claude Code - and it spends hours running experiments - looking at experiment results, making guesses as to what to try next for an experiment - until it eventually comes around to a working code pattern. I mean - maybe this is as much a reflection on me as anything else - but it's pattern of logic isn't that much different from what I would do. It knows, in general, what tools and APIs it can call - it tries something - observes the result, and then comes back and tries different experiments based on success/failure - mostly efficiently bisecting to a solution.

I'm still lower-down of the capability scale - as I'm still manually directing agents to do these wiggins loops - obviously the next step up is to direct the code-loops which control the agents. I just haven't got my tooling nailed in place to the point where I find that's more productive.

I actually might agree with you that this is mostly just "next token prediction" - if I can concede that's really all I do as well.


> I actually might agree with you that this is mostly just "next token prediction" - if I can concede that's really all I do as well.

Yep. Pretty sure I've got an LLM inside too.

The other replies complaining that my thinking is so 2023 -- on the contrary, what's evolved is my own apprehension of how LLM-like most "responses" from humans prove as well.

To be sure, there are other mechanisms at play as well, significant differentiation in our... Volume of training material? Quantizations/compression? Model architecture? Just-ahead-of-time forward branching with back propagation? Double loop adaptive learning? You know, harnessing the LLM. :-) Dare we call it executive function?

LLM mode becomes particularly apparent when conversing with Alzheimer's patients in the stage where short term memories do not form but they retain access to long term memory up to, say, 5 years ago or so. Fifty years of who they are, and one can trigger nearly identical responses with nearly identical prompts.

But that same person may be able to debate 1950s politics while being unable to complete making a sandwich.

If they didn't know of new shortcuts for a task, would almost certainly not "estimate" but "intuit", or "instictively" respond (apply heuristics), largely based on their "priors" aka training material.

If you sit with them and chat a while, you'll even get the kind of looping you get from Qwen trying to think when context is too full.

And if we believe this at all, then ... we should stop scrolling tik tok. Time to read a book. Have an experience. Fine tune. :-)


rather than special casing, make real data based on chat logs for how long things took both in calendar and chat time

Actually in this case they possibly are estimates.

It's been known for some years[1] that LLMs do regression in-context. Frontier models have been trained against many, many issue text that include task break downs and estimates.

[1] https://arxiv.org/html/2409.04318v1


Interesting. So it may have learned how to estimate as a human but doesn’t understand that it doesn’t operate at that speed :D

I wonder if there’s a reasonable way to give an llm parameters that give it a concept of its own execution speed. Seems that could be useful for multiple purposes


Yes, it's entirely possible to do that via RL. It'd be a fun little project you could do for less than $100 on a small LLM actually.

> you have an icon ... simply quit

FTA:

Uh.. how do I quit this app?

The app has no Dock icon and no menubar icon so to quit it you'd need to do one of the following:

Launch Activity Monitor, find Music Decoy and press the button at the top

Run the following command in the Terminal:

  killall 'Music Decoy'

“TokenStream – Server-sent events (SSE) were added to the HTML5 spec in 2008 but never used until 2025.”

I remember chunked transfer encoding shipped in 1997. It's been possible since then to readily and easily stream bytes of text or chunks of html the way everyone sees LLMs do today.

I used this to write a web based telnet client in 1997, and later a text moo / chat for the web. In both cases used a frameset so your line to send was at bottom of screen, the incoming lines were server-sent as things happened server side, and scrolled the client as new lines came in.

There were other things you could abuse before that, but less reliable.

But yeah, talk about things nobody used....


COMET was so far ahead of its time. Sierra Online used it for their webchat in 1995 and it was absolutely the best webchat out there for years

Well, according to first paragraph of the section titled "One tarball, served in place":

The whole site is a single tar file. zeroserve indexes it on load - building a path -> byte-range map - and then serves files by issuing byte-range reads against the tarball itself. Nothing is ever unpacked to disk. The site lives entirely in that one file, so there's no document root for a stray location rule to expose, and a deploy is a single atomic file swap.

OTOH, that could be an LLM justification, since the copy is littered with -isms like "the right shape" or "the surface is broad".


Thanks, I missed that during my read of it

Put one for $10 on Apple App Store and more might impulse buy it.

For my part, I need to be very very sure when it's posted through gumroad. I've gotten burned too many times by short term (as in, within months) abandonware through the gumroad sales channel.

Dev gets bored, doesn't want to deal, download goes unavailable. So you buy it, get a new computer next month, and can't install it. Especially annoying when I "name my own price" typically around $20 to tip the dev, and then the dev won't even keep that build available.


For the App Store I would need to strip off some features.

I think you reasoning is valid, but the app is open source. Worst case one can compile it (e.g. just ask AI agent even if you are not a technical user).

I also put it a low price, for this version, as I would like wide adoption. I truly believe people are going to move heavier into local AI, and it is good to have low friction entries.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You