For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | sgc's commentsregister

If you wouldn't mind, could you explain a bit what the 248B model is good for, and where it breaks down and you need something better? I hear this take often, but it is always a fleeting remark so I have no idea what the 'useful' looks like - at all.

To answer this and my sibling, it's DeepSeek V4 Flash at native FP4 quantization, on two Nvidia DGX Sparks. Which is a bit of kit but still paltry relative to the data centre. ~40 TPS generation, ~2000 TPS prompt processing, which makes it feel approximately as fast as typical APIs.

I primarily use it with my own harness for coding. I'm not going to say it will compete with Opus in the most challenging domains, because it won't, but I will say that there's a reasonable likelihood that Opus is used for tasks that a model like Flash could comfortably handle at 1/100th the cost.

So far I've only seen it struggle at tasks that I myself would struggle with. Tasks that I can describe the shape of the solution for, it has a high success rate at implementing.

Useful is going to be different for everyone. I'm not working on the hardest problems, I don't need the best models.


In my experience they require much more hand holding and more specific directions with less possibilities to interpret a command in several ways. You do the planning, keep on eye on that they're producing and they do the legwork. It's not that their knowledge of Java or PHP or what have you is lacking, it's the long horizon planning that you have to do yourself. Technically they're good. You just have to do more thinking and more reviewing yourself. YMMV.

Depending on quantization I figure they need at least a p4 and likely a p5 EC2 (or similar instance in another provider) for a model with that many parameters. Maybe they are hosting on bare metal but I imagine not. Those instance types (assuming not using spot) are quite expensive to run.

I thought the most interesting part of the post was that they have an mcp endpoint for bring-your-own agents, and they won't be force feeding ai on anybody. In the security context of the post, they mean that you are responsible if your ai is duped into falling a victim, or tricked to send malicious mail.

My first daughter I managed to flip the switch for reading through Tintin and other graphic novels. My younger daughter skipped that entirely. She started reading later than the first, but jumped right in to longer full length books that were captivating for her (they were series she had seen her sister read).

I completely agree that we can encourage but reading needs to come naturally to them. You can't force-feed curiosity and passion, which is what reading is all about for young people.


I was looking at their docs and Burr has agent cookbooks to get started with this, and it can handle multi-machine workflows. Is this not what you were looking for? I am not sure how it integrates and uses skills etc, but it seems like it should work to me.

https://burr.apache.org/docs/examples/agents/


Thank you. I want something like that, but that uses codex/claude code as agents (through their corresponding subscriptions), instead of having to create ad-hoc agents + api keys

You can absolutely do that by using subprocess.run, or use the codex sdk

https://github.com/openai/codex/tree/main/sdk/python


The ai legal situation is going to go through growing pains. I am abstracting from the specific laws of any one country, just thinking about the general context:

If ai output is not copyrightable, it should not be considered personal output. So nobody should be responsible for it. Or if it is considered personal output, it should be copyrightable. Or perhaps the ai companies will be liable for all output, and they will therefore all cease to exist in any useful form? This seems like another alternative, where the output legal value is not central, but there will be a thousand different fights about how it is presented to others.


That's trying to apply two completely separate legal frameworks, with different purposes, and force them to come to the same conclusion about one aspect of each of them. It's not that simple.

It's perfectly legally consistent to say that AI-generated content has no copyright (because it's the product of a computer, not a human), and also that the human or organization operating an AI is legally responsible for anything in its output that is legally actionable.

Someone needs to hold legal responsibility for any piece of content out there. You can't just wrap your decisions in AI and get to be free of all liability for it.

But copyright isn't like that. There's nothing lost to society by saying that content is not copyrightable, and particularly given how the major LLMs were trained, there's a lot lost to society by saying that they can take all of that from everyone without consent, and then everything it produces has copyright and can be used for, say, Google's profit in perpetuity.


Yes, I was not sufficiently thinking about copyright as an arbitrary legal construct that can be manipulated at will. I don't think output should have copyright, but I would presume the copyright should it ever exist would belong to the user and not the LLM creator, just like photoshop does not give adobe rights to user output. However much like there is no copyright, the uncertain output from an LLM should never directly create legal liability - the user prompt and intention should, and legal standards regarding recklessness and malice should apply. Otherwise it's a bit like blaming somebody for the output of a roulette wheel.

So I think I like the current decision which is more about presentation, dissemination, application, and claims than content, and there should of course be liability for LLM creators if they are not actively dealing with results like CP, violence, or many other illegal or dangerous things.


That was a good video, and I also liked the Munro video that does a nice job of explaining how these work: https://www.youtube.com/watch?v=m507ryWhc6c

Not according to the complete comment:

     More like 10%, but my search has not been systematic. I am mostly looking where I know I will find image issues based on image filenames and “Find Similar Images” searches.
They are clearly saying they think this is likely above average.

I thought they tend to pipe far out and discharge as far below the surface as possible, since there is a lot of surface life and it is less damaging this way.

Ships (with long submerged pipes) would be prone to weather events and generally less reliable than an installed pipe. Perforation would be prone to clogging from build up so a nonstarter I would expect. Adding flex tubing and a relocation robot would be a maintenance headache as well. Not sure there is an easy optimization.


Ships wouldn't need a long submerged pipe. It'd just need a small hole like a bilge drain or maybe a live well on a fishing boat. Just let the boat cruise around slowly draining back into the ocean.

As for surface life, I'm no oceanographer, but is that really the most vulnerable place? The surface is where fresh water rain meets the ocean, so that would dilute the salinity during storms. However, there's nothing to say that another pump couldn't be pulling from the ocean and mixing the brine into that so it's diluted before and not just pouring brine straight into the ocean


I think your sense of scale is off. 90% of sea life is on the surface. 0.029% of ocean water is replenished from rainfall annually. Desalination concentrates are absolutely toxic to life. The current daily volume of brine discharge would require more than half the tankers in the world to be filled and discharged every single day. They would of course not last long with such a routine.

Is that a total for all of the oceans? I understand that as a whole, rainfall is literally but a drop in the ocean. However, confined just to the local area where the rain is falling, the area’s salinity has to change. Just like adding the the desalinated brine is a minuscule amount compared to the whole ocean, it has large effect locally.

Regardless, it is totally possible to reintroduce the brine back to the ocean in a way to not be a shock to the local area. We have just chosen to make it harder on ourselves for some illogical reason.


In my opinion you are hand-waving away a difficult engineering problem and proposing a naive solution as if it would solve a problem that has already been partially solved, by rejecting all the work that has already been done on it. Don't dump on the surface, don't burn millions of tons of fuel a year to do it, study what has been done and improve on it instead.

So this is an alternative to using one coding agent with openrouter, changing the models between tasks? I am a neophite in these things, my ai use is more calling apis from scripts right now. Can somebody please explain the pros and cons (beyond to openrouter fees) of each?

Just reply with a quote from the article. They will understand they did not read carefully, and you can avoid the low-value 'read the article' snark (that might be false since often it is not actually in the article when somebody does that).

My question wasn't "how to handle that better". I hope it's okay to point it out :)

I would also argue it's not "often" the case someone asking the obvious question seemingly answered in the article had actually read it. It happens, surely, but it's not a rule of thumb.

That's too meta for a thread here anyways, I think.


It's an in-actionable "question" / comment. The rule does not claim one thing is better than the other. One is easily enforceable, the other is indemonstrable. If the point of this exchange is to better understand and use HN, the reason is because it is not hard to be constructive instead of throwing out non sequiturs.

And I didn't say it's '"often" the case someone asking the obvious question seemingly answered in the article had actually read it'. I said the person pointing it out while refusing to provide receipts or cordially engage is often wrong about what they think is obviously in the article. It's worthless noise regardless.


I'd rather read "it's in the article you didn't read" than pretty much anything else.

The ideal case of course is that there are only legit questions and discussion from people who actually read what they are talking about. If they miss something that's fine as long as it's the honest exception. But this is not a thing that exists or can exist, so it doesn't count. It's not actually available to be a "What I'd like the most."

The next-most ideal case is when someone talks about something they didn't read, that no one else responds at all. The noise is the minimum possible noise from the original source and it just gets ignored. This aslo is not a real thing, and so not up for consideration.

What's left is some flavore of "noise". This is not avoidable. it will exist and the only choices are what form and flavor it takes.

I think it is most conductive for everyone, the poster, the bystanders, everyone, including people who don't like "noise", is the obvious and natural response. That it's the obvious and natural response for a reason.

Low value and snark may be true but it's irrelevant. It's still the best most productive reaction. (Within reason, 500 of the same response to one comment isn't very interesting reading, but multiple of the same agreeing response does serve a purpose which serves us all.)

That's what I mean by "I'd rather read that than almost anything else."

There are are no better options that actually exist.

As for the hall monitor aspect, telling people they shouldn't say the obvious most applicable thing is also hall monitor.

All in all, I just find the argument sorta valid but weak.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You