More

thedatamonger · 2026-03-24T07:14:53 1774336493

what bothers me is not that this issue will certainly disappear now that it has been identified, but that that we have yet to identify the category of these "stupid" bugs ...

sigmoid10 · 2026-03-24T07:32:34 1774337554

We already know exactly what causes these bugs. They are not a fundamental problem of LLMs, they are a problem of tokenizers. The actual model simply doesn't get to see the same text that you see. It can only infer this stuff from related info it was trained on. It's as if someone asked you how many 1s there are in the binary representation of this text. You'd also need to convert it first to think it through, or use some external tool, even though your computer never saw anything else.

Measter · 2026-03-24T13:38:33 1774359513

> It's as if someone asked you how many 1s there are in the binary representation of this text.

I'm actually kinda pleased with how close I guessed! I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964.

Then I ran your message through a program to get the actual number, and turns out it has 1800 exactly.

sigmoid10 · 2026-03-26T15:57:33 1774540653

>I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964

And that's exactly the kind of reasoning an LLM does when you ask it about characters in a word. It doesn't come from the word, it comes from other heuristics it picked up during training.

datsci_est_2015 · 2026-03-24T09:05:52 1774343152

Okay but, genuinely not an expert on the latest with LLMs, but isn’t tokenization an inherent part of LLM construction? Kind of like support vectors in SVMs, or nodes in neural networks? Once we remove tokenization from the equation, aren’t we no longer talking about LLMs?

fenomas · 2026-03-24T10:07:20 1774346840

It's not a side effect of tokenization per se, but of the tokenizers people use in actual practice. If somebody really wanted an LLM that can flawlessly count letters in words, they could train one with a naive tokenizer (like just ascii characters). But the resulting model would be very bad (for its size) at language or reasoning tasks.

Basically it's an engineering tradeoff. There is more demand for LLMs that can solve open math problems, but can't count the Rs in strawberry, than there is for models that can count letters but are bad at everything else.

thedatamonger · 2026-03-09T16:08:01 1773072481

so .. if i'm getting this right, this is an article about security, but the author can't be bothered to configure https correctly?

craftkiller · 2026-03-09T16:21:30 1773073290

What'd they get wrong? Firefox and curl aren't reporting any TLS errors for me.

thedatamonger · 2026-03-09T23:31:11 1773099071

$ dig vivianvoss.net A +short @ns11.infomaniak.ch.

78.46.78.181

$ curl -v https://vivianvoss.net/ 2>&1 | tail -3

* OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number

* Closing connection

curl: (35) OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number

$ curl -v http://vivianvoss.net/ 2>&1 | grep Location

< Location: https://www.safebrowse.io/warn.html?url=http://vivianvoss.ne...

$ whois 78.46.78.181 | grep -i netname

netname: HETZNER-RZ-NBG-NET

$ host 78.46.78.181

181.78.46.78.in-addr.arpa domain name pointer min2max.run.

The domain's authoritative nameserver (Infomaniak) points vivianvoss.net at 78.46.78.181 — a Hetzner box in Germany with rDNS min2max.run. That server redirects HTTP to SafeBrowse.io and responds to TLS handshakes with garbage. Not a local issue, not a DNS hijack — the A record itself is wrong.

craftkiller · 2026-03-10T00:23:06 1773102186

Hmm so oddly enough this works fine for me:

  $ curl -v https://vivianvoss.net/ 2>&1 | tail -3
        <script src="/assets/scripts/perf.js"></script>
    </body>
    </html>

And the logs show it is going to the same address:

  * Established connection to vivianvoss.net (78.46.78.181 port 443) from 172.16.245.55 port 36208

Any chance you're a comcast xfinity customer? Searching for safebrowse.io shows that xfinity "advanced security" does this whole redirect to safebrowse.io.

--

Unrelated, but the site also returns an AAAA record for an ipv6 address that does not work. So they've misconfigured their server in that regard.

  $ drill vivianvoss.net AAAA  @1.1.1.1
  [...]
  vivianvoss.net. 3600 IN AAAA 2a01:4f8:120:34ad::1
  [...]
  
  $ curl --header 'Host: vivianvoss.net' 'https://[2a01:4f8:120:34ad::1]:443'
  <hangs forever>

  $ curl https://ipv6.google.com
  <works immediately>

thedatamonger · 2026-03-10T05:26:04 1773120364

Some further digging ...

-------

$ dig vivianvoss.net A +short @8.8.8.8

78.46.78.181

$ curl -v4 https://vivianvoss.net/ 2>&1 | grep -E "Connected|error"

* Connected to vivianvoss.net (78.46.78.181) port 443

* OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number

$ curl -s https://ipinfo.io | grep org

"org": "AS7922 Comcast Cable Communications, LLC",

Same IP you're hitting, same port, but Comcast's xFi Advanced Security seems to be MITMing the connection before TLS completes.

I hate Comcast so much ...

thedatamonger · on April 28, 2024

this looks very awesome. can someone tell me why there is no chatter about this? is there something else out there that blows this out of the water in terms of ease of use and access to sample many LLM's ?

brrrrrm · on April 28, 2024

HN isnt really the best space for LLM news - r/LocalLlama and twitter are much better. I think HN has some cultural issues with “AI” news

wkat4242 · on April 28, 2024

Hmm I don't think so. Most comments are pretty positive.

I think the articles are just not really upvoted unless it's really big news, makes sense because HN is for more than just AI.

But I don't think it's anti-AI like most people here would be pretty anti-cryptocurrency (and for good reason IMO)

p1esk · on April 29, 2024

I didn’t upvote it because I don’t use Ollama. To experiment with LLMs I use Huggingface. Does Ollama provide something I cannot get with Huggingface?

lolinder · on April 29, 2024

Ollama provides a web server with API that just works out of the box, which is great when you want to integrate multiple applications (potentially distributed on smaller edge devices) with LLMs that run on a single beefy machine.

In my home I have a large gaming rig that sometimes runs Ollama+Open WebUI, then I also have a bunch of other services running on a smaller server and a Raspberry Pi which reach out to Ollama for their LLM inference needs.

p1esk · on April 29, 2024

Sure, maybe it’s better for niche use cases like yours.

HF is the biggest provider of llms, and I guess I haven’t run into it’s limitations yet.

jkh1 · on April 29, 2024

Running locally is sometimes necessary, e.g. you don't want to send sensitive data to any random third party server.

Zambyte · on April 29, 2024

Both Ollama and Huggingface distribute models. The latter sites have model hosting services too, but that isn't the only way to use models from there.

gertop · on April 29, 2024

Hugging face is a model repository.

Ollama allows you to run those models.

Different things.

p1esk · on April 29, 2024

I run models using HF just fine. I mean I’m using HF transformers repo, which gets models from HF hub.

Or do you mean commercial deployment of models for inference?

simonw · on April 29, 2024

Are you talking about the Hugging Face Python libraries, the Hugging Face hosted inference APIs, the Hugging Face web interfaces, the Hugging Face iPhone app, Hugging Face Spaces (hosted Docker environments with GPU access) or something else?

p1esk · on April 29, 2024

I updated my comment above: I’m using HF transformers repo, which gets models from HF hub.

simonw · on April 29, 2024

Do you have an NVIDIA GPU? I have not had much luck with the transformers library on a Mac.

p1esk · on April 29, 2024

Of course. I thought Nvidia GPUs are pretty much a must have to play with DL models.

objektif · on April 29, 2024

Well being able to run these models on CPU was pretty much the revolutionary part of llama.cpp.

p1esk · on April 29, 2024

I can run them on CPU - HF uses plain Pytorch code - fully supported on CPU.

tmostak · on April 29, 2024

But it's likely to be much slower than what you'd get with a backend like llama.cpp on CPU (particularly if you're running on a Mac, but I think on Linux as well), as well as not supporting features like CPU offloading.

p1esk · on April 29, 2024

Are there benchmarks? 2x speed up would not be enough for me to return to c++ hell, but 5x might be, in some circumstances.

SushiHippie · on April 29, 2024

I think the biggest selling point of ollama (llama.cpp) are quantizations, for a slight hit (with q8 or q4) in quality you can get a significant performance boost.

p1esk · on April 29, 2024

Does ollama/llama.cpp provide low bit operations (avx or cuda kernels) to speed up inference? Or just model compression with inference still done in fp16?

My understanding is the modern quantization algorithms are typically implemented in Pytorch.

SushiHippie · on April 29, 2024

Sorry I don't know much about this topic.

The only thing I know (from using it) that with quantization I can fit models like llama2 13b, in my 24GB of VRAM when I use q8 (16GB) instead of fp16 (26GB). This means I can get nearly the full quality of llama2 13b's output while still being able to use only my GPU, without the need to do very slow inference on only CPU+RAM.

And the models are quantized before inference, so I'd only download 16GB for the llama2 13b q8 instead of the full 26GB, which means it's not done on the fly.

p1esk · on April 30, 2024

As an aside, even gpt4 level quality does not feel satisfactory to me lately. I can’t imagine willingly using models as dumb as llama2-13b. What do you do with it?

SushiHippie · on April 30, 2024

Yeah I agree, everytime a new model releases I download the highest quantization or fp16, that fits into my VRAM, test it out with a few prompts, and then realize that downloadable models are still not as good as the closed ones (except speed wise).

I don't know why I still do it, but everytime I read so many comments how good model X is, and how it outperforms anything else, and then I want to see it for myself.

simonw · on April 29, 2024

There's a Python binding for llama.cpp which is actively maintained and has worked well for me: https://github.com/abetlen/llama-cpp-python

wkat4242 · on April 29, 2024

Ollama supports many radeons now. And I guess llama.cpp does too, after all it's what ollama uses as backend.

p1esk · on April 29, 2024

PyTorch (the underlying framework of HF) supports AMD as well, though I haven’t tried it.

chadsix · on April 28, 2024

Ollama is really organized - it relies on llama but the UX and organization it provides makes it legit. We recently made a one-click wizard to run Open WebUI and Ollama together, self hosted and remotely accessible but locally hosted [1]

[1] https://github.com/ipv6rslimited/cloudseeder

gertop · on April 29, 2024

LM Studio is a lot more user friendly, probably the easiest UI to use out there. No terminal nonsense, no manual to read. Just double click and chat. It even explains to you what the model names mean (eg diff between Q4_1 Q4_K Q4_K_M... For whatever reason all the other tools assume you know what it means).

Built-in model recommendations are also handy.

Very friendly tool!

However it's not open-source.

Cheer2171 · on April 29, 2024

Why do you think there is no chatter about this? There have been hundreds of posts about ollama on HN. This is a point release of an already well known project.

FieryTransition · on April 28, 2024

I use a mix of using llamacpp directly via my own python bindings and using it via llamacpp-python for function calling and full control over parameters and loading, but otherwise ollama is just great for ease of use. There's really not a reason not to use it, if just want to load gguf models and don't have any intricate requirements.

CharlesW · on April 29, 2024

I can recommend LM Studio and Msty if you're looking for something with an integrated UX.

perrygeo · on April 30, 2024

Opposite reaction here. I was just thinking, man I hear about Ollama every single day on HN. Not sure a point release is news :-)

throw03172019 · on April 28, 2024

Lola a has been brought up many times on HN. It’s a great tool!

thedatamonger · on April 10, 2024

From the related article: https://www.quantamagazine.org/avi-wigderson-complexity-theo...

> ... if a statement can be proved, it also has a zero-knowledge proof.

Mind blown.

>Feeding the pseudorandom bits (instead of the random ones) into a probabilistic algorithm will result in an efficient deterministic one for the same problem.

This is nuts. AI is a probabilistic computation ... so what they're saying - if i'm reading this right - is that we can reduce the complexity of our current models by orders of magnitude.

If I'm living in noobspace someone please pull me out.

IshKebab · on April 10, 2024

I don't know exactly what it's saying but it definitely isn't that. AI already uses pseudorandom numbers and is deterministic. (Except some weird AI accelerator chips that use analogue computation to improve efficiency.)

ilya_m · on April 10, 2024

> AI is a probabilistic computation ... so what they're saying - if i'm reading this right - is that we can reduce the complexity of our current models by orders of magnitude.

Unfortunately, no. First, the result applies to decision, not search problems. Second, the resulting deterministic algorithm is much less efficient than the randomized algorithm, albeit it still belongs to the same complexity class (under some mild assumptions).

mxkopy · on April 10, 2024

Can’t you build search from decision by deciding on every possible input?

thedatamonger · on April 8, 2024

thank you for that. I startled the dog with that laugh :)

thedatamonger · on Jan 7, 2024

nature says eat or be eaten(or die). now that we are on top of the food chain, it's useful, for mental health and societal reasons, not not want to rip every one's throat out, even thou every other species (including us) continues to do so. You tend to steer and veer towards what you look at. The good news is we get to choose what we look at... the bad news is we (statistically) choose wrong. Race car drivers don't look at the wall when they drive, because when they do, they tend to hit it. stop looking at the wall guys ...

thedatamonger · on March 18, 2022

Rather than having selections for multiple languages (for each task) it seems like language detection or a selection/setup screen would be best. With fallback, to english, or whatever your default is. Maybe use online translations services?

edit: Oh it seems you do have a language drop down, but there are still multiple languages appearing in quests... this just means more quests I guess eh :)

qw3rty01 · on March 18, 2022

He means more programming languages

blindpirate · on March 19, 2022

Exactly.

thedatamonger · on March 9, 2022

Bravo! I enjoyed this.

thedatamonger · on March 5, 2022

Bravo! Well written and informative! And as someone who's obsessed with Dart at the moment timely! Thanks!

thedatamonger · on Feb 18, 2022

This is exactly like what I was looking for but it seems very incomplete. Does anyone else know of a resource like this that isn't the first hit on google?

HN For You