More

adrian_b · 2026-04-16T22:17:17 1776377837

Running llama-server (it belongs to llama.cpp) starts a HTTP server on a specified port.

You can connect to that port with any browser, for chat.

Or you can connect to that port with any application that supports the OpenAI API, e.g. a coding assistant harness.

adrian_b · 2026-04-16T22:12:18 1776377538

The IEEE standard FP16 is an older 16-bit format, which has balanced exponent and significand sizes.

It has been initially supported by GPUs, where it is useful especially for storing the color components of pixels. For geometry data, FP32 is preferred.

In CPUs, some support has been first added in 2012, in Intel Ivy Bridge. Better support is provided in some server CPUs, and since next year also in the desktop AMD Zen 6 and Intel Nova Lake.

BF16 is a format introduced by Google, intended only for AI/ML applications, not for graphics, so initially it was implemented in some of the Intel server CPUs and only later in GPUs. Unlike FP16, which is balanced, BF16 has great dynamic range, but very low precision. This is fine for ML but inappropriate for any other applications.

Nowadays, most LLMs are trained preponderantly using BF16, with a small number of parameters using FP32, for higher precision.

Then from the biggest model that uses BF16, smaller quantized models are derived, which use 8 bits or less per parameter, trading off accuracy for speed.

adrian_b · 2026-04-16T21:54:48 1776376488

The 397B model can be run at home with the weights stored on an SSD (or on 2 SSDs, for double throughput).

Probably too slow for chat, but usable as a coding assistant.

xienze · 2026-04-16T22:05:12 1776377112

I think you have that backwards. Agentic coding is way more demanding than simple chat. The request/response loops (tool calling) are much tighter and more numerous, and the context is waaaaay bigger in general.

fragmede · 2026-04-17T07:25:29 1776410729

In processing power, but chat is interactive. Agentic coding, you come up with a plan and sign off on it, and then just let it go for a while. It's the difference between speed and latency.

adrian_b · 2026-04-16T20:13:48 1776370428

True, but that was only temporary.

Glemllksdf · 2026-04-17T09:26:51 1776418011

Tuapse Oil Refinery attack happened two days ago and why not just adding infos on my comment through a comment instead of a downvote?

adrian_b · 2026-04-16T20:08:14 1776370094

Some airlines have already started to cancel some of their flights, to save fuel, e.g. KLM and United.

adrian_b · 2026-04-16T18:31:27 1776364287

I assume that by "higher resolution images" you mean images with a bigger size in pixels.

I expect that for the model it does not matter which is the actual resolution in pixels per inch or pixels per meter of the images, but the model has limits for the maximum width and the maximum height of images, as expressed in pixels.

adrian_b · 2026-04-16T18:21:46 1776363706

I have not seen any comment from the early tests of 4.7 claiming that it does not work better than the previous version.

However, there have been some valuable warnings about problems that have been hit in the first minutes after switching to 4.7.

For instance that the new guardrails can block working at projects where the previous version could be used without problems and that if you are not careful the changed default settings can make you reach the subscription limits much faster than with the previous version.

adrian_b · 2026-04-16T18:00:55 1776362455

If the vendors of programs do not want bugs to be found in their programs, they should search for them themselves and ensure that there are no such bugs.

The "legit security firms" have no right to be considered more "legit" than any other human for the purpose of finding bugs or vulnerabilities in programs.

If I buy and use a program, I certainly do not want it to have any bug or vulnerability, so it is my right to search for them. If the program is not commercial, but free, then it is also my right to search for bugs and vulnerabilities in it.

I might find acceptable to not search for bugs or vulnerabilities in a program only if the authors of that program would assume full liability in perpetuity for any kind of damage that would ever be caused by their program, in any circumstances, which is the opposite of what almost any software company currently does, by disclaiming all liabilities.

There exists absolutely no scenario where Anthropic has any right to decide who deserves to search for bugs and vulnerabilities and who does not.

If someone uses tools or services provided by Anthropic to perform some illegal action, then such an action is punishable by the existing laws and that does not concern Anthropic any more than a vendor of screwdrivers should be concerned if someone used one as a tool during some illegal activity.

I am really astonished by how much younger people are willing to put up with the behaviors of modern companies that would have been considered absolutely unacceptable by anyone, a few decades ago.

atonse · 2026-04-16T19:44:04 1776368644

Not sure where the younger people thing came from, but I'm 45 and have been working in this industry since 1999. But even when I was in my 20s, I don't remember considering that I had a "right" to do something with a company's product before they've sold it to me.

In fact, I would say the idea of entitlement and use of words like "rights" when you're talking about a company's policies and terms of use (of which you are perfectly fine to not participate. rights have nothing to do with anything here. you're free to just not use these tools) feels more like a stereotypical "young" person's argument that sees everything through moralistic and "rights" based principles.

If you don't want to sign these documents, don't. This is true of pretty much every single private transaction, from employment, to anything else. It is your choice. If you don't want to give your ID to get a bank account, don't. Keep the cash in your mattress or bitcoin instead.

Regarding "legit" - there are absolutely "legit" actors and not so "legit" actors, we can apply common sense here. I'm sure we can both come up with edge cases (this is an internet argument after all), but common cases are a good place to start.

adrian_b · 2026-04-16T20:59:02 1776373142

You cannot search for bugs or vulnerabilities in "a company's product before they've sold it to you", because you cannot access it.

Obviously, I was not talking about using pirated copies, which I had classified as illegal activities in my comment, so what you said has nothing to do with what I said.

"A company's policies and terms of use" have become more and more frequently abusive and this is possible only because nowadays too many people have become willing to accept such terms, even when they are themselves hurt by these terms, which ensures that no alternative can appear to the abusive companies.

I am among those who continue to not accept mean and stupid terms forced by various companies, which is why I do not have an Anthropic subscription.

> "if you don't want to give your ID to get a bank account, don't"

I do not see any relevance of your example for our discussion, because there are good reasons for a bank to know the identity of a customer.

On the other hand there are abusive banks, whose behavior must not be accepted. For instance, a couple of decades ago I have closed all my accounts in one of the banks that I was using, because they had changed their online banking system and after the "upgrade" it worked only with Internet Explorer.

I do not accept that a bank may impose conditions on their customers about what kinds of products of any nature they must buy or use, e.g. that they must buy MS Windows in order to access the services of the bank.

More recently, I closed my accounts in another bank, because they discontinued their Web-based online banking and they have replaced that with a smartphone application. That would have been perfectly OK, except that they refused to provide the app for downloading, so that I could install it, but they provided the app only in the online Google store, which I cannot access because I do not have a Google account.

A bank does not have any right to condition their services on entering in a contractual relationship with a third party, like Google. Moreover, this is especially revolting when that third party is from a country that is neither that of the bank nor that of the customer, like Google.

These are examples of bad bank behavior, not that with demanding an ID.

atonse · 2026-04-17T01:42:50 1776390170

With the bank example, I thought your comment had some anti KYC language so I mixed it up with another response, sorry for the confusion.

I actually kind of agree with you in some principle, IF we had no choice. Like the only reason I can say “you can choose not to purchase this product” is because that is true today, thanks to competition from commercial and open source models.

But I’d be right there with you on “someone needs to force these companies to do ____” if they were quasi monopolies and citizens needed to use their technology in some form (we see this with certain patents around cell phone tech for example)

senko · 2026-04-16T18:38:57 1776364737

> If someone uses tools or services provided by Anthropic to perform some illegal action, then such an action is punishable by the existing laws and that does not concern Anthropic any more than a vendor of screwdrivers should be concerned if someone used one as a tool during some illegal activity.

In civilised parts of the world, if you want to buy a gun, or poison, or larger amount of chemicals which can be used for nefarious purposes, you need to provide your identity and the reason why you need it.

Heck, if you want to move a larger amount of money between your bank accounts, the bank will ask you why.

Why are those acceptable, yet the above isn't?

> I am really astonished by how much younger people are willing to put up with

Unsure where you got the "younger people" from.

adrian_b · 2026-04-16T21:10:28 1776373828

Your examples have nothing to do with Anthropic and the like.

A gun does not have other purposes than being used as a weapon, so it is normal for the use of such weapons to be regulated.

On the other hand it is not acceptable to regulate like weapons the tools that are required for other activities, for instance kitchen knives or many chemicals, like acids and alkalis, which are useful for various purposes and which in the past could be bought freely for centuries, without that ever causing any serious problems.

LLMs are not weapons, they are tools. Any tools can be used in a bad or dangerous way, including as weapons, but that is not a reason good enough to justify restrictions in their use, because such restrictions have much more bad consequences than good consequences.

> Unsure where you got the "younger people" from.

Like I have said, none of the people that I know from my generation have ever found acceptable the kinds of terms and conditions that are imposed nowadays by most big companies for using their products or their attempts to transition their customers from owning products to renting products.

The people who are now in their forties are a generation after me, so most of them are already much more compliant with these corporate demands, which affects me and the other people who still refuse to comply, because the companies can afford to not offer alternatives when they have enough docile customers.

adrian_b · 2026-04-16T17:29:32 1776360572

I agree with what you what you have written, which is why I would never pay a subscription to an external AI provider.

I prefer to run inference on my own HW, with a harness that I control, so I can choose myself what compromise between speed and the quality of the results is appropriate for my needs.

When I have complete control, resulting in predictable performance, I can work more efficiently, even with slower HW and with somewhat inferior models, than when I am at the mercy of an external provider.

brightball · 2026-04-16T20:43:38 1776372218

What’s your setup?

adrian_b · 2026-04-16T21:37:16 1776375436

For now, the most suitable computer that I have for running LLMs is an Epyc server with 128 GB DRAM and 2 AMD GPUs with 16 GB of HBM memory each.

I have a few other computers with 64 GB DRAM each and with NVIDIA, Intel or AMD GPUs. Fortunately all that memory has been bought long ago, because today I could not afford to buy extra memory.

However, a very short time ago, i.e. the previous week, I have started to work at modifying llama.cpp to allow an optimized execution with weights stored in SSDs, e.g. by using a couple of PCIe 5.0 SSDs, in order to be able to use bigger models than those that can fit inside 128 GB, which is the limit to what I have tested until now.

By coincidence, this week there have been a few threads on HN that have reported similar work for running locally big models with weights stored in SSDs, so I believe that this will become more common in the near future.

The speeds previously achieved for running from SSDs hover around values from a token at a few seconds to a few tokens per second. While such speeds would be low for a chat application, they can be adequate for a coding assistant, if the improved code that is generated compensates the lower speed.

brightball · 2026-04-16T21:41:48 1776375708

Thank you for that, it's very interesting. I keep wanting to find time to try out a local only setup with an NVIDIA 4090 and 64gb of RAM. It seems like it may be time try it out.

adrian_b · 2026-04-16T15:21:10 1776352870

This was the research paper which introduced the abbreviation "VLIW" and contrasted it with the concept of RISC, which had been introduced a few years earlier (the term RISC had been coined in October 1980, but the concept was older than that, coming from the IBM 801 project, a few years earlier).

At that time, the easiest way to understand the difference between RISC and VLIW CPUs and earlier CPUs was to compare them with the microprogrammed CPUs of the seventies, which used either "vertical microprogramming" or "horizontal microprogramming".

RISC CPUs could be viewed as modified vertically microprogrammed CPUs and VLIW CPUs could be viewed as modified horizontally microprogrammed CPUs.

In both cases the modification consisted in replacing the read-only microprogram memory with a read-write cache memory and eliminating the decoder that converted complex instructions into simple vertical microinstructions or horizontal microinstructions.

Thus what was previously the simpler instruction set used in microprograms became the programmer-visible ISA.

The term "vertical" had been applied to microinstructions that executed one simple operation per clock cycle, while "horizontal" was applied to microinstructions that executed in parallel multiple simple operations per clock cycle. Horizontal microinstructions differed from vector instructions a.k.a. SIMD instructions, because for each concurrent operation it was possible to specify in the encoding distinct source and destination registers from those used by the concurrent operations.

A few years later after this VLIW paper, IBM coined the word "superscalar" which was applied to a CPU structure that was improved over VLIW by adding a dynamic scheduler for the concurrently executed operations, replacing their static scheduling done by the programmer or the compiler.

While the word "superscalar" was coined only in March 1987, the concept of using a dynamic scheduler to enable the out-of-order execution of multiple operations per clock cycle was much older, being described in an internal confidential document of IBM from 1966-02-23 (inventor: Lynn Conway).

That document was not known outside IBM for many years, but John Cocke, one of the 2 proponents of the superscalar CPUs had been a member of the IBM ACS team, so he must have read the original document about dynamic scheduling, 20 years before the concept was revived when the CMOS technology had become ready for it.

HN For You