For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | Majromax's commentsregister

Are Macs/etc compute bound with their 'it fits in unified memory' language models? Certainly by the time you're streaming weights from SSD you must be back in a bandwidth-bound regime.

You meant “reasonable,” but you did not apply reason. Situations such as this can be handled with a quota set at something like 150% of median use, but then extended upon a justified request. It can work in a lab where there’s a human touch, but it fails at million-user scale where even that level of human support is too expensive.

> but it fails at million-user scale where even that level of human support is too expensive.

In general this is a myth promoted by platforms with millions of users. The vast majority of such large platforms could easily afford that level of human support, they just actively choose mot to give it. Blackblaxe - if they even have millions of users - belongs to the minority of such companies.


> picking up pennies in front of a steamroller.

This or any other statistical play is only 'in front of a steamroller' if you do it with leverage, especially if notionally uncorollated bets suddenly move together. Bets on Polymarket have limited downside by design, and bets in different categories are obviously unrelated to each other.

Without having looked into this in detail, however, I suspect the problem would be limited capacity; markets that are both deep and so trivially irrational are probably fairly infrequent. You might pick up pennies but only pennies.


Leverage is not the steamroller here.

It’s the 90% chance of making 1$ vs 10% chance of losing 100$.

The exact numbers vary, the expectations even out with high volume stocks but prediction markets do not because of rounding that favors the house.


That's just the binary nature of the bet. You address that in a real trading strategy with (fractional) Kelly position sizes. Anyone doing this for actual money would also be well served by implementing continuous monitoring and active risk management over top, in order to limit maximum drawdowns if the trend evaporates.

> Bookmakers price purely based on facts and statistics. Their pricing isn't affected by excitement nor by how many people are betting a certain way.

A bookmaker is a market maker, and they ideally want to end up with no net interest in a position. They then take guaranteed profit in the bid-ask spread, which in sportsbooks is the 'vig'. Bookmakers who adjust their odds in real-time don't have to be particularly clever about the fundamentals, just responsive to the competing demands on either side.

A bookmaker who intentionally takes a position on a game is the equivalent of a proprietary trader or hedge fund. It's potentially more profitable, but it's also adversarial against 'sharp' traders.

Bookmakers who set odds at the beginning and don't move with the action must set larger bid-ask spreads to compensate.


> Edit: conversely, if the average no costs _more_ than 73 cents, but the 73% of all polymarkets resolve to No, that would imply that an everything-always-happens strategy is profitable (neglecting slippage)

Or just the bid-ask spread; price no at 73.25 and yes at 27.5 and you have a profitable but theoretical mid-market price.


'Enough' [capital] is doing a lot of work in that sentence. In the limit of a one-sided irrational market, the 'rational actor' would need to take the other side of every open transaction.

Yeah, but in the limit of a one-sided irrational market, the rational actor is going to be given as much capital as he can take.

In the long run. In the short run, our rational actor will be constrained by the Kelly criterion, well and whatever outside funding she can raise.

Wet streets cause rain? You don’t show that the Polymarket signal lead Nasdaq.

> Not a lawyer, but as I understand it the license is a matter of copyright, and the copyright only applies to the design files. So as long as you're making that keyboard for yourself then you should be good to do anything you want with the keyboard, because it is no longer using the license at that point.

What if I take the design, print it, include the thing in a staged photo, and sell prints of the photo?

What if I skip the printing and use the design files as a basis for a rendered photo or animation?

What if I print the design, then use a 3D scanner to recreate a file from the physical artifact?


You're asking some pretty niche copyright questions that even a lawyer would have to spend time searching for case law for. It may be more expedient to look for that case law yourself.

If you need to be an attorney to figure out if you're allowed to take a picture of something, we've already jumped the shark.

Not what he asked.

I mean maybe if you take a super pedantic, weaponized autism type of interpretation of the very first question, sure.

Not a lawyer either, but:

> What if I take the design, print it, include the thing in a staged photo, and sell prints of the photo?

Probably fair use, provided the design wasn't the main focus of the photo, but merely part of the "set dressing."

> What if I skip the printing and use the design files as a basis for a rendered photo or animation?

> What if I print the design, then use a 3D scanner to recreate a file from the physical artifact?

Those questions are simpler - both scenarios would be derivative works of the original files, so covered by the license.


but are those derived works copyrightable? I don't think they are.

Copyright law forbids the creation of derivative works (excepting any region-specific fair-use rules) so you're only allowed to create them under the rights granted to you in the terms of the license - thus under this particular license you can't make commercial use of derivative works.

But is a physical item a derivative work of it's technical specifications?

If the design files qualify for copyright protections, then modifications to them would clearly be derivative works.

I don't think it is clear if the keyboard itself would be a derivative work, as it almost certainly can't be protected by copyright. This is what patents are for.


Ianal, usual disclaimers, etc.

The design files don't qualify for copyright protections, they describe the design which (maybe) qualifies for copyright protections.[0]

The artistic design of a specific keyboard can certainly be copyrighted, but not the functional nature of it.

[0]The exact wording might be protected, but not the factual information contained. Sports scores, or say measurements of a keyboard, are not copyrightable items as they are just facts, though their presentation might be.


> What if I take the design, print it, include the thing in a staged photo, and sell prints of the photo?

This is probably acceptable

> What if I skip the printing and use the design files as a basis for a rendered photo or animation?

This is probably NOT acceptable

> What if I print the design, then use a 3D scanner to recreate a file from the physical artifact?

If you used that for personal things yes that would be acceptable. I do not think that would give you the right to then sell that as a product neither digitally nor phsically


What if I'm a sculptor and I design and produce a statue? Shouldn't I still have the copyright to the statue, no matter what kind of machine I used to do the actual sculpting?

Yes, but that only applies if you count the keyboard as a work of art worthy of copyright protection.

What if I print the design, then use a 3D scanner to recreate a file from the physical artifact?

Hmm, without patents it would definitely be fine to scan an existing one and recreate it. I think this would be fine too, but any time you are clearly going out of your way to skirt the law is a red flag. The thing is, I don't even think technical designs are copyrightable outside of their aesthetic value.

What if I take the design, print it, include the thing in a staged photo, and sell prints of the photo?

What if I skip the printing and use the design files as a basis for a rendered photo or animation?

If it is indeed covered by copyright, then these would likely be violations, though I guess it depends on how prominent it is in the staged photo.

...this stuff is fun to think about.


obviously the photos and media is covered by copyright, but rendering your own probably is not.

Anyone who's ever `DROP TABLE`d on a production rather than test database has encountered the same problem in meatspace.

In this context, the MCP interface acts as a privilege-limiting proxy between the actor (LLM/agent) and the tool, and it's little different from the standard best practice of always using accounts (and API keys) with the minimum set of necessary privileges.

It might be easier in practice to set up an MCP server to do this privilege-limiting than to refactor an API or CLI-tool, but that's more an indictment of the latter than an endorsement of the former.


Wait a second, they define the induced vector field (and consequently Lie bracket) in terms of batch-size 1 SGD:

> In particular, if x is a training example and L(x) is the per-example loss for the training example x, then this vector field is: v^(x)(θ) = -∇_θ L(x). In other words, for a specific training example, the arrows of the resulting vector field point in the direction that the parameters should be updated.

but for the MXResNet example:

> The optimizer is Adam, with the following parameters: lr = 5e-3, betas = (0.8, 0.999)

This changes the direction of the updates, such that I'm not completely sure the intuitive equivalence holds.

If it were just SGD with momentum, then the measured update directions would be a combination of the momentum vector and v1/v2, so {M + v1, M + v2} = {v1, M} + {M, v2} + {v1, v2}. The Lie bracket is no longer "just" a function of the model parameters and the training examples; it's now inherently path dependent.

For Adam, the parameter-wise normalization by the second norm will also slightly change the directions of the updates in a nonlinear way (thanks to the β2 term).

The interpretation is also strained with fancier optimizers like Muon; this uses both momentum and (approximate) SVD normalization, so I'm really not sure what to expect.


Yeah, this is a good point. IIRC, I wasn't able to get the network to train very well at all with standard SGD. I don't think I thought to try Adam with β1 = 0, I will try it (& recompute brackets) if I get some time.

If we have built up a momentum M, then the two orderings are:

M' = M + εv1

θ' = θ + M' = θ + M + εv1

M'' = M' + εv2(θ') = M + εv1 + ε(v2 + (M + εv1)⋅∇v2)

M' = M + εv2

θ' = θ + M' = θ + M + εv2

M'' = M' + εv1(θ') = M + εv2 + ε(v1 + (M + εv2)⋅∇v1)

Then the resulting difference in momenta M'' is:

ε^2*[v1, v2] + ε(M⋅∇)(v2 - v1)

So there is an extra term which is not actually a Lie bracket itself. I think the bracket can still be informative on its own, but it's definitely no longer the sole component of what happens when order is swapped.

One other inconsistency that is a little less bad is BatchNorm. Since it needs a whole batch to work, and we're just comparing individual examples, I computed the Lie brackets with the BatchNorm layers in eval mode, not train mode.

I don't know if there is any relevance of this to Muon, even if so, it would likely be very messy to compute.


Well, the "vector field defined by the update attributable to this training sample" is a well-defined thing (even if it's not just the gradient of loss with respect to parameters), so that part translates.

However, what's harder to interpret is how this field transports with respect to θ, since the momentum vector and θ are themselves inextricably linked. If you somehow arrived at a different θ, then you'd have a different momentum. (On the gripping hand, the bracket is a construct of infinitesimals, maybe that doesn't matter.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You