For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | adrian_b's commentsregister

The competitor for this NVIDIA CPU will not be the now old AMD Strix Halo, but its successor (launched recently), which supports up to 192 GB of unified memory. Thus 128 GB is no longer SOTA.

While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).

However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).


I don’t think there is much improvement in compute for the new strix halo revision. The next one supposedly adds rdna4 cores or similar and more memory channels

It may not have attacked intentionally the EU yet, but a week ago there was an incident when a Russian drone apparently strayed away from whatever Ukrainian target it may have had, and it hit an apartment building in the city of Galati, in Romania, in the EU, injuring two people.

In the past there have been other incidents with Russian weapons reaching the neighboring countries from the EU, like Poland and Romania, but this was the first time when they hit a populated area, causing human injuries.


That was not a Russian drone in Romania. Ukraine has admitted it was theirs.

https://www.bbc.com/news/articles/c707098wkzpo


The linked article disproves your claim. It seems you're mistaking the naval drone incident, which did not cause casualties, with the Russian drone strike.

> The country's defence ministry said the drone had self-detonated near an oil terminal without causing any casualties, although authorities have said it caused considerable damage to a ship and warehouses.

> Ukraine later confirmed one of its naval drones had been involved, saying it had been knocked off course by Russian electronic interference. Moscow has yet to comment.

> It also comes a week after two people were injured when a drone hit a Romanian apartment block in the eastern city of Galati - close to the border with Ukraine.

> Romanian officials said they had confirmed it was a Russian drone but Moscow said "accusations" of its involvement were "unsubstantiated".


You’re right. I confused the two. Thanks for the correction.

Branch prediction is indeed necessary for any pipelined CPU. It does not matter whether the CPU also has other more modern features, like being superscalar, having out-of-order execution, etc.

That is why the simplest kind of branch prediction, i.e. static branch prediction, had already been implemented in an early pipelined computer, the IBM Stretch, which was designed around 1959/1960, so it is hardly a "modern" computer.

When the result of a comparison is random, which happens frequently in certain kinds of sorting or searching applications, as you say, that defeats any kind of branch prediction.


I do not agree to call "exceptions" the cases where branchless code is preferable, because they can be quite frequent in certain application domains, like sorting and searching.

The difference between the cases when branches are worse and the cases when they are better, is whether the tested condition is random (i.e. unpredictable) or not.

Whenever you compare a random number with a threshold (or two random numbers between themselves) and use the result for conditional execution, that is an example where using branches is worse.

In most cases, when writing a program it is easy to estimate whether branches will be predictable or not, and in the latter case branchless methods should be used.


This is true, but TFA argued that the network services themselves should not use /etc/hosts or other similar translation database, so a not yet updated file should not cause a network outage.

TFA proposed that /etc/hosts or the like should be used only for the benefit of administrators, to allow manual connections by name instead of by address, and presumably to make easy to interpret the activity logs. This is a desirable feature, but the network should work fine even when the name-to-address translation is temporarily unavailable, because of not-yet-updated /etc/hosts files.

Actually I have used for decades a system similar to what TFA proposes, avoiding to do DNS queries for the internal networks, while using my own DNS caching resolver for the Internet, but this was done only in relatively small networks, with a few hundred nodes at most, and where the IP addresses were changed infrequently. Thus I have no idea whether in a big network with frequently changed addresses there would be scaling problems.


Yes, the description from TFA does not match the traditional Thunderbolt networking protocol, whose performance may be as low as that of a 10 Gb/s Ethernet interface.

The description from TFA matches what the poster above you said about a new Linux device driver that allows access to the raw Thunderbolt protocol for transferring data between computers. This appears to be an independent implementation of the same principle as in the device driver that will be merged in the mainline Linux.

While the official Linux device driver makes the raw Thunderbolt appear like a file, which can be written and read to transfer data, this implementation emulates an Infiniband interface, which presumably was simpler to use for distributing work over multiple GPUs.

They actually mention that with traditional Thunderbolt networking on the same computers, they had obtained only 9 Gb/s, i.e. more than 5 times slower than what they obtained with raw Thunderbolt.


> traditional Thunderbolt networking protocol ... performance may be as low as that of a 10 Gb/s Ethernet interface.

Ouch. Why so much lower than the physical bandwidth (or what they've achieved here)?


A USB4 40Gbps cable consists of two 20G tx/rx pairs. The in-kernel networking implementation is single-stream and just uses one pair, and won't e.g. stripe across both pairs or across multiple cables, which was the main bandwidth unlock in TFA. Doing so would be a much more complicated undertaking, since now you've re-introduced out-of-order delivery which complicates re-assembly of large packets, retries, handling loss etc. The verbs interface is a lot simpler than that of a full IP stack, so although was possible to get this working across rails, may not be so simple for something pretending to be ethernet.

> now you've re-introduced out-of-order delivery which complicates re-assembly of large packets, retries, handling loss etc.

Still confused though. For a standard TCP/IP networking stack, that support is all there anyway, as it's not meant for point-to-point links, and out-of-order delivery is a thing that happens on the Internet. I haven't tried thunderbolt-net, but it says it implements Apple's ThunderboltIP, so I'd expect it's IP-based networking on top, and so it'd all work? Is it that out-of-order delivery is far more common than usual, and this path is so much slower (by impairing LRO/GRO) that it's not worth aggregating at all?

I'd understand if each pair is logically represented as a separate networking device, and then you have to set up link aggregation on top of that. (And iirc at least with some forms of aggregation a particular flow is bound to one link, so you'd have to have a bunch of streams to actually get bandwidth benefits.) So caveats for sure but I'd expect something to be possible. But does it just not support using both pairs at all?

Even with using one pair I still don't understand why you'd only get about 10G rather than 20G on a pair. I do see chapter 4 of the (your?) article talks about the single DMA ring maybe imposing the 10 Gbps limit but I don't have any good intuition for why. I don't know say how large the rings are or what latencies to expect on their operations or what packet sizes are supported which might help me understand.


Yeah, thunderbolt-net is IP on top and it does work as you say, with a few caveats:

- On a single cable with two rails available, the thunderbolt-net grabs one and uses that. Without patching the kernel, there's no way to make it present a second interface using the remaining pair.

- If you had a second cable between the machines (for 4 total rails), thunderbolt-net will still only grab one rail, because the abstraction across which it's making the links sees an identical peer at the end of both links and so falls into the same trap as above. There is no LRO/GRO anyway (or it's buggy- I forget) on the linux version.

- Why you only get 10G rather than 20G on single pair- actually, this might be something specific to the Strix Halo SoC that I was testing on- on a different (still AMD) chipset and an Apple TB5 Mac I did see closer to 22G in one direction, but still 8 in the other. The Strix Halo NHI seems to be 'stripped down' (as expected, for mobile) in ways I don't really understand.

- Intuition on why- I can't point you to the line number, but I think it has to do with a fixed 4kb page size when communicating with the NHI that ends up becoming a bottleneck, perhaps 16kb pages on aarch64 apple help here?


Ugh, yeah, gross for `thunderbolt-net` only support one link in total, though presumably fixable.

> - Intuition on why- I can't point you to the line number, but I think it has to do with a fixed 4kb page size when communicating with the NHI that ends up becoming a bottleneck, perhaps 16kb pages on aarch64 apple help here?

I'm used to page size making a difference (due to TLB pressure) but not a factor of 2. I'm not familiar with DMA, so maybe there's some reason it'd be that dramatic there, but I'm unsure.

If the total size vs the latency of draining is just so small that it frequently fills and stalls, or if the sender and receiver can't be accessing it at once (but I don't think should be true?), it might make more sense. I think if I were wanting to make this thing go more smoothly, I'd probably start by measuring fractions of the time the tx/rx buffers are completely empty and completely full.

Actually, I'm not sure I'm understanding the text "we only have a single DMA ring for tx and rx" either. Does that mean one for tx and one for rx? or really one ring in total? if the latter, does it have to say drain fully before switching modes? that would seem pretty crippling.


Anthropic themselves have explained that the harness for Mythos has a very important role in finding the vulnerabilities, because the model does not start from scratch, but the harness runs the model many times on each file of the code base, with different prompts, where the prompts evolve depending on the results of the previous runs.

First with more generic prompts, to determine whether it is worthwhile to do a detailed analysis of that file, then with more specific prompts to identify the bugs, and eventually with a prompt that requests a confirmation that a given bug/vulnerability exists.

For a proper comparison between some other model and Mythos, you also need such a complex harness. If you just tell to an LLM "find the bugs", and it does not find a vulnerability known to have been found by Mythos, that is a totally invalid comparison.

The final results provided by Mythos, like a PoC exploit or a patch, are also generated with a prompt that points to the exact code that has the vulnerability (which is supposed to exist based on the results of the previous runs).


My take from the SCW interview is that the Mythos harness isn't all that important and the author thought it would be even less important with future models. But maybe I misremember.

Anthropic has a vested interest in downplaying the harness relevance. In my experience harness really matters. More capable models are great, but current models are enough if you put some engineering effort into the harness.

The harness does not matter that much, it's getting leaner every cycle.

But a good harness lowers the model floor and accessibility and makes stronger models that much better.

In these cases of the TI parts, some of their most important specifications, like maximum supply voltage, noise and slew rate, have been changed, and not by a few percent, but by even a factor close to 2.

For so great changes, it is really not acceptable to use the same part number, especially when the part numbers have been in widespread use for many decades, so most users who are familiar to them will not bother to check again their latest specifications, where they could notice that they are no longer what they knew.


For now, US not only exports a lot of oil, but it also imports a lot of oil, and those quantities cannot be compensated, because it is not the same quality of oil.

Thus there can exist oil shortage in US, simultaneously with exporting great quantities of oil.


Not really because to make gasoline, both sorts are necessary. When anyone in the world makes gasoline they need oil they import from the US and they don't need/can't substitute it with the kinds of oil US imports. So there cannot be a shortage. In addition, it is the US oil (lighter sorts) that are in more shortage worldwide because Gulf states produce similar ones and supply from there is choked. US imports heavy oil and it is abundant and will be even more abundant now when Venezuela's supplies will be returning online. It gives US a bigger advantage than it sounds.

Some meteors do explode.

The meteors made only of iron alloy and/or silicate rocks do not explode, but they may fragment into many smaller bodies.

The meteors that contain great amounts of volatile substances (water, carbon compounds and sulfur compounds) may explode if the interior becomes hot enough to convert the volatiles into gases. When such a meteor rich in volatiles fragments, some of the fragments may explode, while others may reach the surface of the Earth intact.


This fragmentation is not violent and it's not what produces the shockwave, the term "break up" is more appropriate.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You