For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | pittma's commentsregister

Cool stuff! This method is very similar to how AVX-512-optimized RSA implementations work too, as they also have to do Very Large Exponentiations. This paper[1] covers how RSA does its windowing, which includes the formula showing how the window size can be arbitrary. One additional thing AVX-512 RSA implementations do, though, is store the results of the multiplications for the range [0..2^{window-size}) in a table; then, for each window, that result is retrieved from the table[2] and only the shifting/repositioning has to be done.

1. https://dpitt.me/files/sime.pdf (hosted on my domain because it's pulled from a journal)

2. https://github.com/aws/aws-lc/blob/9c8bd6d7b8adccdd8af4242e0...


Ooh interesting, I should have looked at this while developing.... Looks like that code could definitely use another version for e.g. Zen 5 where using zmm registers would lead to a 2x multiplication throughput. Also the mask registers are bounced to GPRs for arithmetic but that's suboptimal on Zen 4/5.

Separately, I'm wondering whether the carries really need to be propagated in one step. (At least I think that's what's going on?) The chance that a carry in leads to an additional carry out beyond what's already there in the high 12 bits is very small, so in my code, I assume that carries only happen once and then loop back if necessary. That reduces the latency in the common case. I guess with a branch there could be timing attack issues though


ymms were used here on purpose! With full-width registers, the IFMA insns have a deleterious effect on frequency, at least in the Icelake timeframe.


Ye, hence a separate version for CPUs which don't have that problem. Although, maintaining so many of these RSA kernels does seem like a pain. Didn't realize u wrote that code; super cool that it's used in practice!


I am not the original author—this is adapted from an implementation by Shay Gueron, the author of that paper I linked, but I do agree that it's cool!


zen5 can run avx512 at near full boost clocks: https://chipsandcheese.com/p/zen-5s-avx-512-frequency-behavi...


> dpitt.me/files/sime.pdf (hosted on my domain because it's pulled from a journal

One can also upload to archive.org: https://archive.org/download/sime_20250531/sime.pdf


I have a pretty elaborate Hakyll site with custom routers and all kinds of junk (https://dpitt.me), but it's an old site that started as Django, then I built it from scratch with Sinatra, and then was Jekyll for years. There really is something special about a well-organized static site. For all the rewrites of this old thing, I can't imagine moving away from static site generation.


You can, in fact, use traits to do type-level programming in Rust[1], but this is type-level programming; it isn't /dependent/ types. The biggest "blocker" for using dependent types is that programs must be /total/, because that program has to be evaluated for it to typecheck! If a program is not total, the typechecker has the potential of getting stuck. The two dependently-typed languages I've programmed in, Mostly Agda and a tiny bit of Idris, have totality checkers to aid in this. Generally this means that recursion is only allowed if its structural, and there can be no run-forever loops, i.e., no servers.

I think instead, for one of these languages, the ideal would be opt in should you need it. Dependent types are, by their nature, proof carrying[2], and there are times when you want this, but for a general-purpose programming language, also times that you do not. You can't be total over IO (or any effect for that matter), so, yeah, skip it when arg-parsing, but then opt-in when you're processing your financial transactions or actuating your robot.

[1] https://dpitt.me/talks/tt-for-rust/

[2] http://www.cs.nott.ac.uk/~psztxa/publ/ydtm.pdf


> The biggest "blocker" for using dependent types is that programs must be /total/, because that program has to be evaluated for it to typecheck!

Totality is orthogonal to dependent types. You can absolutely have non-total programs at the type level: Rust has such programs today in fact!

> Generally this means that recursion is only allowed if its structural, and there can be no run-forever loops, i.e., no servers.

You can have an infinite loop in a total language like Agda or Idris: coinduction is the mechanism.

> Dependent types are, by their nature, proof carrying[2], and there are times when you want this, but for a general-purpose programming language, also times that you do not.

This doesn't make a whole lot of sense to me. Dependent types are "proof carrying" in the sense that any program of the type:

    Int -> Int
Is a proof that there exists a function of type `Int -> Int`, nothing more. I don't know what "opting out" of the "carrying" there would mean.

> You can't be total over IO (or any effect for that matter), so, yeah, skip it when arg-parsing, but then opt-in when you're processing your financial transactions or actuating your robot.

You can be total over IO, and effects do not imply non-totality either.


> Totality is orthogonal to dependent types. You can absolutely have non-total programs at the type level: Rust has such programs today in fact!

Absolutely, the talk I linked to gets at this to some extent. In rust, partiality causes type checking to fail. You could use it on purpose!

> You can have an infinite loop in a total language like Agda or Idris: coinduction is the mechanism.

For sure, but the totality checker is appeased by sized-types, or at least that's how I know to do it in Agda. I've heard it can be done without them, but I'm not familiar with the approach. This is what I intended w/r/t structural recursion.

> This doesn't make a whole lot of sense to me.

This is because, as the authors state, to type check the program is to evaluate it, so before it can run you have proof that it is correct.

By opting out I'm more talking about the converse, to opt in when you need it. Kind of the opposite of Idris's %partial directive.

> You can be total over IO, and effects do not imply non-totality either.

That's fair, you're right. This is poorly stated.


Idris does support forever loops or servers without losing totality. See https://idris2.readthedocs.io/en/latest/tutorial/typesfuns.h...


Hm, that's not how I'm reading this. An infinite loop is, by definition, partial. What I'm gathering from this is that that stuckness that the typechecker can encounter in the face of partiality is okay in Idris, that its typechecker just says, "welp, this is as far as I go."


I’m specifically talking about the “Produces a non-empty, finite, prefix of a possibly infinite result” part on that page. As long as your server generates finite responses in finite times, Idris recognizes your code as total.


When dealing with corecursion (infinite streams) you don't care about whether you terminate, but rather, whether the infinite stream is _productive_ meaning it doesn't stop producing values. The prime example of a function that's impossible(?) to prove productive (for all cases) is filter – since it takes a predicate function that decides whether values are returned or not. If you give filter a predicate that always returns false you will never produce any values, hence that wouldn't be productive.


If you have codata you can have infinite loops in total functional programming.

https://en.wikipedia.org/wiki/Corecursion


> Are you saying there have been studies that show Haskell's type system has not been correlated to higher correctness than say Java or Python, or are you saying you are unaware of any such studies?

On the contrary! Consider for instance: "Total Haskell is Reasonable Coq"[1]

1. https://arxiv.org/abs/1711.09286


This has absolutely no bearing on the matter. The fact that some simple propositional logic statements could be proven using Haskell if Haskell were total, does not mean it has a significant impact on correctness (which requires predicate logic) even if Haskell were total, and it certainly doesn't mean that it's more effective at increasing correctness than Python.

And, BTW, Coq is not a particularly effective tool for writing correct software, either. The largest software whose correctness was largely verified in Coq (CompCert) is positively tiny compared to standard software (it's less than 1/5th of jQuery), and the cost was huge. So Coq's "correctness score" of increased correctness per unit of effort is not particularly stellar compared to alternatives.



feL4 developer here. Happy to field questions, and to note that we'll keep introducing and extending the ergonomics for building complex applications, configuring platforms, and adding hardware support in the coming weeks/months.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You