For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more matheist's commentsregister

Why do you think the constrained percentages are 0/75/25 and not eg 0/66/33? (ie same relative likelihood for valid outputs)


The constraint algorithm looks something like:

1. Choose the first token. If well-trained you have a 75% chance of choosing "a" and a 25% chance of choosing "b". Both are valid for that grammar.

2. Choose the second token. Regardless of your first token there is exactly once choice of grammar-adhering completion. You're now at a 75% chance of "ab" and a 25% chance of "ba" (mirroring the first-token chance).

For a toy example like this you obviously wouldn't use an LLM, but techniques like you're suggesting don't work because it's infeasible to enumerate all the valid outputs and re-weight and because greedy and semi-greedy strategies aren't anywhere near sufficient to side-step the issue. At the point in time you select the "a" token at a 75% probability it's game-over unless you re-run the LLM. You can't beam search either (doing so just changes which token you'll mis-predict, and even then only for very local grammar mistakes).

Looking at my JSON example from earlier, a beam search to avoid that re-weighting requires a depth of at least 4 (going as far as the ellipsis plus the stop token), and it won't suffice to just consider locally high-weight paths (you can probably hack something together for that one issue in particular which searches high weight paths and backtracks if they're found to be low-weight due to grammar mismatches, but that has its own bias unless you fan out to all 1e19 length-4 paths, and it won't solve the general problem regardless).

Phrased slightly differently, you don't have a compute_future_grammar_adhering_weight(token) function which is tractably computable, so you can't actually redistribute the 8.3% probability from the "a" branch to the "b" branch.


Oh now I understand. I thought your ab and ba were single tokens (even though that doesn't make sense in context). Once you point out they're separate tokens, I follow you. Thank you!

Edit: that's a great example

Edit 2: even more fun: training data is [ab, ab, ba, bb, bb, bb]. Then constrained sampling flips your likelihood from 1:2 to 2:1


Thanks :) My example is minimal, which is a little nice since I wind up re-deriving it in a hurry every time I need it. I do like the 1:2 to 2:1 symmetry though. Very elegant.


Valuation includes expected future growth, it's not just present value of future revenue given today's numbers.

You may not agree with the market's estimation of that, but comparing just present revenue isn't really the right comparison.


You know, it only just now occurs to me to wonder if the blackjack story is the public sanitized version of "how I got $24k because I'm not allowed to tell you the real version"


Great thought, that seems very likely since so many "founder stories" are heavily spun tales.


Las Vegas still had deep mafia ties in the 1970s so that’s very possible.


Sorry to actually your actually, but the derivative of a function f from a space A to a space B at the point a is a linear function Df_a from the tangent space of A at a to the tangent space of B at b = f(a).

When the spaces are Euclidean spaces then we conflate the tangent space with the space itself because they're identical.

By the way, this makes it easy to remember the chain rule formula in 1 dimension. There's only one logical thing it could be between spaces of arbitrary dimensions m, n, p: composition of linear transformations from T_a A to T_f(a) B to T_g(f(a)) C. Now let m = n = p = 1, and composition of linear transformations just becomes multiplication.

(Only half kidding)


The distinction between the space A and the tangent space of A becomes visually clear if we consider a function whose domain is a sphere. The derivative is properly defined on the tangent plane, which only touches the sphere at a single point. However in the neighborhood of that point, the plane and sphere are very, very close together. But are inevitably pulled away by the curvature of the sphere.

Of course that picture is not formally correct. We formally define the tangent space without having to embed the manifold in Euclidean space. But that picture is a correct description of an embedding of both the sphere and the tangent space at a single point.


Oh I appreciate you actualling my actually ^^ but isn’t this case a special case of the one I wrote? I.e. when an and b are manifolds and admit tangent bundles?


Why, I’m sure you could come up with a succinct explanation of a monad :-)


> We are also building an LLM-based ad-blocker after Chrome blocked uBlock Origin.

Since it's a Chromium fork, why not re-enable uBlock Origin instead?


Chromium will remove the Manifest V2 APIs, and none of these forks want to maintain them. Brave also chose to have their own built-in adblocker.

The real question is, why not opt to fork Firefox who is doing that work for them.


+1, enabling uBlock origin could be a short term solution.

But we are working on adding built-in adblockers just like Brave + enhancing it to detect more ad formats using lightweight local LLM.


> enhancing it to detect more ad formats using lightweight local LLM.

So you're going to boot up my_llm.prompt("is this request an ad? %s" % request.copy_as_curl()) for every XHR on the page?!

I know this comment is on a page about an AI web browser, but let's turn down the "LLMs cure all problems" a notch


The Faust compiler uses de Bruijn indices internally to reuse computations. Anyone else know any other examples?

https://github.com/grame-cncm/faust


Idris[1] and most of the dependent typed languages that I've looked at use de Bruijn numbers. (As does my own[2].)

The Idris implementation also has a list of names in scope as an argument to the type of a Term, which makes the compiler (also Idris) check if you're making a mistake with your de Bruijn indices. (I did not do this in mine, but I should have.)

Edit: Oh, I see you mention reusing computations, that may be a bit more subtle. I'll have to look at your repo to see what you mean by that.

[1]: https://github.com/idris-lang/Idris2 [2]: https://github.com/dunhamsteve/newt


Thanks! (Just clarifying, it's not my project)


Looks interesting! Is there any intuition for why this should be the case? Did you discover it via that intuition, or just random experimentation?

A note, your install script appears to still have a placeholder at the "apply patch" step. A suggestion, might be more user-friendly to fork llama.cpp and then include that as a git submodule rather than make it a "git clone and apply patch" step.

A further note, everyone and their dog has a different local python set-up, might be nice to let people separate the llama.cpp stuff from the python stuff rather than bake in a dependence on homebrew python.


Great question about the intuition! The difference comes from the core roles these components play in attention.

Keys determine which tokens to attend to - they create the actual attention pattern through similarity calculations. Values only store what information gets passed forward once attention is decided.

When a key vector is quantized too aggressively, it distorts the similarity calculations for every token interaction. A small error in keys can completely redirect attention to the wrong tokens.

Values, however, are much more forgiving. When a value vector is quantized, any error only affects the specific information content of that single token after the attention pattern is already established.

It's like a library catalog system vs. the books themselves. If catalog numbers (keys) are corrupted, you'll look in completely wrong sections. If some words in books (values) are smudged, you're still reading the right book - just with occasional noise.

Mathematically, keys participate in softmax calculations where small errors get exponentially amplified through the normalization process. Values just undergo linear weighted averaging, where errors tend to cancel out.

I first encountered this asymmetry in papers like "More for Keys, Less for Values" and "KV-AdaQuant," but wanted to quantify exactly how it impacts Apple Silicon inference. The 7× quality difference between K8V4 and K4V8 using identical memory was striking.

Thanks for the installation feedback too! I'll fix the placeholder and make the Python dependencies more flexible.


My understanding is that the roles of KVQ aren’t actually well understood and that while they’re called key/value/query tensors it’s not quite straightforward to tease out what they mean or the role they play.


Great explanation thanks for this!


> A note, your install script appears to still have a placeholder at the "apply patch" step. A suggestion, might be more user-friendly to fork llama.cpp and then include that as a git submodule rather than make it a "git clone and apply patch" step.

The patch doesn't actually apply to llama.cpp because argument parsing was moved to arg.cpp 8 months ago.

That doesn't matter, though, because the options to set K and V quantization were added to llama.cpp in 2023.

I don't understand why the patch exists at all, other than as an attempt to make this look novel by changing the settings through a different command line argument?

I would strongly recommend that nobody run an install.sh file from a new repo like this, especially when it's not necessary for something as simple as applying a patch file.


I think in modern physics "classical" often means "not quantum", rather than "pre-modern".


"Guerrero" comes from Spanish "guerra", which is cognate to English "war". They both derive from a common proto-Germanic root.


> intended to be sent upstream to the Mozilla Firefox

This part is difficult if you actually want those changes to be accepted.

I recently had a patch accepted into Firefox. More than three months from submission to merge, including one round of code review which I turned around the same day. It was not a large patch. This is no criticism of the Firefox team, just the reality that my priorities are not their priorities.

They don't necessarily have the bandwidth or interest in accepting other people's/teams' vision or contribution.


> This is no criticism of the Firefox team, just the reality that my priorities are not their priorities.

I am a former Mozilla Corporation employee, so I am more willing to criticize the current state of MoCo culture as a whole...

> They don't necessarily have the bandwidth or interest in accepting other people's/teams' vision or contribution.

I would say it really depends on the nature of the patches being contributed; if they are not inconsistent with project goals and not excessively burdensome, I'd hope that they in theory would be considered.

However, I will say that MoCo culture was already much different by the late 2010s than it was in the early 2010s. When I joined MoCo in 2012, there were multiple managers I interacted with who openly valued community interaction and encouraged their reports to set quarterly goals relating to mentoring external contributors. IMHO that encouragement had died off by the late 2010s.


When you left, do you have a sense for how many developers were actually working on Firefox full-time? I'm curious because people always say that Firefox would be impossible to fund, pointing to Mozilla's expenses, but I've never seen someone actually put forward the math for what portion of those expenses are actually Firefox.


Oh geez, it's been long enough that I don't really remember the specifics. In the hundreds, for sure.


had a positive experience recently on an issue and getting it fixed, people were helpful and instructive. For drive-by newbies there's an initial penalty to dig into Mozilla tooling. Lowering the threshold there will attract more contributors.


This is really telling of the current vibe I get from Firefox, and why I feel resistant to support them beyond “It’s a bit more private than default Chrome”

Companies gonna company and expand in the wrong direction if they forget were they come frome.


That doesn’t seem unreasonable for a drive by PR to an enormous project. I contributed go an open source rust project a few years back and my first PR took weeks of back and forth. My second and following ones were merged in days.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You