For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | sdrapkin's commentsregister

Guid.CreateVersion7 in .NET 9+ claims RFC 9562 compliance but violates its big-endian requirement for binary storage. This causes the same database index fragmentation that v7 UUIDs were designed to prevent. Testing with 100K PostgreSQL inserts shows rampant fragmentation (35% larger indexes) versus properly-implemented sequential GUIDs.


fcrand (fast crypto/rand) is a high-performance drop-in replacement for Go's crypto/rand.


The vast majority of Golang developers would benefit from using Guid library instead of UUID library. It’s substantially faster in all cases, more secure (by 2^6) and has more functionality.

For random token-as-string generation Golang developers should be using https://github.com/sdrapkin/randstring instead of crypto/rand.Text (faster and more flexible).


The vast majority of Golang developers are neither hobbled by the lack of gigabyte throughput for random identifier generation nor are they on the verge of becoming victims to attacks on identifiers with "only" 2^122 random bits.


Agreed. So at worst they (Golang developers) should be indifferent, and at best they should opt for the faster choice. With serverless code billing by the second, faster choices are directly correlated to lower costs.


> With serverless code billing by the second, faster choices are directly correlated to lower costs.

The kind of Go developers who think about these optimizations don't use overpriced, inefficient serverless services.


cuid2 generates variable-length strings. If you want fast cryptographically strong string generation, I recommend https://github.com/sdrapkin/randstring. It will likely be faster than cuid2.


That doesn't address what I said. Nor explains why your package is better.


Guid package generates guids/uuids. Your linked package generates variable length strings. These are different usecases (oh, and your benchmarks are inferior to https://github.com/sdrapkin/randstring). Nothing to argue about.


But this doesn't generate guid/uuids? It generates random bytes.


Guid/uuid is defined as a 16-byte structure. Are you questioning the “byte” part, or the “random” part?


No, its defined to a series of specifications. [0] Ones that define an underlying structure, in bits.

You have a 16byte random string. Thats great. But it is not a UUID.

[0] https://www.rfc-editor.org/rfc/rfc9562.html

> The UUID format is 16 octets (128 bits) in size; the variant bits in conjunction with the version bits described in the next sections determine finer structure.


No, Guid/uuids are defined as 128-bit labels used to uniquely identify objects in computer systems. This 128-bit/16-byte definition predates any RFCs that one may or may not choose to implement. I'm obviously aware of RFC 9562, and nowhere in the Guid library do I claim implementation of it. RFC 9562 is a choice, and one that should not be made blindly, or for you. It all starts with 16 random bytes. Google's uuid starts that way, and virtually every other Guid/uuid implementation. Then, on top of that building block, one may tweak additional non-random bits if the usecase truly requires it. If it does - you can do it quickly and cheaply on top of 16 random bytes. If the usecase does not require it (99% of cases), you're better off with the foundational 16 random bytes. The perspective of "your 16 random bytes do not implement RFC 9562 - BAD, BAD!" is very myopic. But if wasting bits on versions and variants is something that helps someone sleep better - they can easily and cheaply achieve that with a couple of bit ops. RFC 9562 robs developers of that choice.


Ok... But if you want to ignore the last twenty years, you should probably pick another name, because it has been used a particular way for two decades.

If you want "more choice" - use a name unbound by a tradition old enough to drink.


No need to argue. You just haven't addressed the point that a fast UUID generator is a security risk. I don't care about benchmarks.

And in most use cases where I'd need a UUID, I'd usually want the string representation of it.


Fast guid/uuid generators are NOT a security risk. You want such generators to be as fast as possible, without compromising cryptographic strength.


It's on the roadmap (already implemented in a similar .NET library - https://github.com/sdrapkin/SecurityDriven.FastGuid).


In case you missed it, "guid.Read()" is a much faster alternative to "crypto/rand". https://pkg.go.dev/github.com/sdrapkin/guid#Read


IMHO "Guid" is just as well known (Wikipedia agrees: https://en.wikipedia.org/wiki/Universally_unique_identifier), and "UUID" was already taken by Google.


> "UUID" was already taken by Google.

This shouldn't really matter as your import paths are obviously different. `github.com/google/uuid` and `github.com/sdrapkin/guid` can happily coexist. Any file/codebase importing both (which would ideally be avoided in the first place) can alias them.

> IMHO "Guid" is just as well known

I think the point the commenter was trying to make is that these do not adhere to the UUID spec. You don't specify which version, but judging by the docs and your comparison to `github.com/google/uuid`, I'd wager most folks looking at this library would assume they are supposed to be V4 UUIDs.


> This shouldn't really matter as your import paths are obviously different.

I'm aware of that, of course. Guid is intentionally named differently from "uuid" (both as a package and as a type) to ensure there is no confusion between them in code. It is not the goal of Guid to mimic/inherit all uuid APIs. Guid is its own package, with a different API surface and roadmap (ie. I'll borrow what makes sense and do things differently when it makes sense).


The spec uses both UUID and GUID. You can expect the same thing for both.

> This specification defines UUIDs (Universally Unique IDentifiers) -- also known as GUIDs (Globally Unique IDentifiers) -- and a Uniform Resource Name namespace for UUIDs.


I think the point is that this just generates 16 random bytes whereas UUIDs/GUIDs have structure, they at least have a variant fields indicating what kind of UUID/GUID it is. The closest thing to all random bytes would be variant 10xx, version 4 or 8.


You are correct - Guid very specifically and intentionally generates a structure of 16 random bytes. In decades of programming I've never needed a random 16-byte structure to have a "internal versioned structure". In very rare cases this is truly needed, bit-twiddling post-generation can cheaply fix it (but not the other way around). Which is why all these "versions" and "variants" in standard universally applicable libraries are a complete waste of entropy and cycles.


I do not think I have seen or noticed code that inspects UUID variants either but I could certainly imagine that such code is out there, for example to protect against accidental information leakage from UUID variants that are not purely random. With that in mind it seems a good idea to adhere to the standards if one uses an established name. Neither the few lost bits nor the effort to correctly indicate the variant sound like real issues to me.


I was on a team once that would add server information to ids. Between that and using the version that has a date on it, it made debugging things MUCH easier. Just plug in the id and tooling could easily determine which logs to look into for when it was generated. Obviously, you may still need to widen your search for many reasons. But it is hard for me to think this is on most people's threat model.


> "UUID" was already taken by Google

Your link also says that the term UUID predates the founding of Google by over a decade.


Thanks for your feedback. If you are skilled in Golang, I suggest you review the code more thoroughly for a more accurate understanding (especially compared to what standard uuid does).


Much faster (~10x) than standard github.com/google/uuid package

I'm interested in feedback from the HN community.


what real-world problem, if any, does 10x faster UUID generation solve?

from your readme, `guid.New()` is 6~10 ns, so presumably the standard UUID package takes 60-100 ns?

say I generate a UUID, and then use that UUID when inserting a row into my database, let's say committing that transaction takes 1 msec (1 million ns)

if I get a speedup of 90 ns from using a faster UUID package, will that even be noticeable in my benchmarks? it seems likely to be lost in the noise.

honestly, this seems like going on a 7-day road trip, and sprinting from your front door to your car because it'll get you there faster.


Amazon AWS S3 web servers process millions of requests per second, and each response generates a random Request-Id. It’s not exactly 16 bytes, but this is a very realistic scenario where guids are used in hot path. If you are writing a cute-kitten blog, might as well use Python instead..


Why is it so much faster than `uuid`?


It generates entropy 4kb-at-a-time (instead of on each call), and uses a cache-pool instead of single cache behind a lock (which is what standard uuid does in "RandPool=ON" mode).


So this automatically makes it unsafe in case of VM snapshots.

The Linux kernel now has an optimization that makes it safe: https://lwn.net/Articles/983186/

Go should automatically benefit from this, if they use the vDSO getrandom().


Ah cool, the note here is also interesting: https://pkg.go.dev/github.com/google/uuid#EnableRandPool


GCM (ie. AES-GCM) has the following problems, which extended variants - those that deterministically randomize (key,nonce) pair - do not solve:

Inability to encrypt more than 64Gb with the same (key,nonce) pair.

Lack of commitment (whether key-commitment, or key+nonce+ad commitment).

If one is seriously considering breaking away from existing GCM standards to create yet-another-standard, such proposal would need to offer improvements in all areas (ex. a proposed standard for converting any AEAD into streaming chunk-based AEAD with practically unlimited message sizes under the same (key,nonce) and unlimited message counts.

GCM-256 is ubiquitous and is often the preferred choice for all the reasons mentioned by the author, but that very argument is what makes non-standard GCM with 11 AES-rounds silly.

In 2023 we should be working on new standards that "wrap" existing crypto-primitives (which are already implemented/available in countless hardware-accelerated libraries/APIs) to get additional features/benefits/capabilities - not musing about AES with 10+1 rounds or SHA-512-really-fast with 80-1 rounds..


> Inability to encrypt more than 64Gb with the same (key,nonce) pair.

I think a better way is to derive the content per file part and then use a ratcheting nonce to encrypt the subparts. That also gives you random access into the entire file in ~O(1) (i.e. no need to decrypt the entire file) and the ability to interrupt and resume decryption. Unfortunately, there's no standard that describes how the output should be serialized, so tool interop becomes a problem. Although, to be fair, there's no serialization standard for AES either (i.e. what do you do with the nonce?) so it's probably not a big deal.


The issue with 64GB is one thing. Not many people encrypt single 64GB files. But people do use aes-gcm for tunnels and encrypting billions of messages and you can only use the key so many times before you have to rotate. Many people screw this part up.


I think it's easier than that. When using the IV using a 96 bit fixed field and 32 bit counter, FIPS SP800-38d says you are limited to 2**32 "invocations of the authenticated encryption function with any given fixed field and key". What they call the fixed field is the common 96 bit IV or nonce.

So you can do 512Gb or 64GB under one nonce. Then simply increment the fixed field and run the next 64GB under the same key and new nonce (nonce+1) and so on. In essence, it's the same thing is just making the fixed field smaller and the counter bigger, but meets the letter and intent of the law. The "fixed field" can be anything the user wants, including being "constructed from two or more smaller fields". And it is not constrained to remain the same under multiple invocations. Still compatible with FIPS and common implementations. It doesn't have to be some fancy ratcheting scheme.

The initial "fixed field" or nonce could even always just be all zeros [1]. It doesn't matter, it's not secret.

If for some reason you want to encrypt that much under one key, which I think you really don't.

1: well, in most cases especially AES-256: https://crypto.stackexchange.com/questions/68774/can-a-zero-...


The problem with this is that if the nonce ever collides under the same key, GCM's security falls to pieces, and so if you're using the nonce as an extended counter --- a thing people do, in part because they're worried 96 bits is too short to safely choose randomly --- you have to design a system that can't use the nonce "0" (or "1" or "2") twice.

If all your keys are ephemeral this isn't a big worry, but if they aren't, you can end up talking about reliably keeping state between invocations of your whole program.

(Apologies if this is obvious!)


> you have to design a system that can't use the nonce "0" (or "1" or "2") twice.

just like any counter mode. it's vitally important, but not difficult to understand or implement.

the other point is, WHY NOT JUST ROLL THE KEY MORE OFTEN. nobody should be encrypting 64GB under the same key. and 96+256 is enough bits that can be chosen randomly to never worry about collisions.


This bit about it not being difficult to implement is false. The single most damaging vulnerability class of the last 25 years came from the inability of programmers to reliably count bytes. It's simple to come up with something that works reliably without the presence of an adversary. But as soon as you add an adversary who will manipulate inputs and environments to put you into corner cases, counting becomes quite difficult indeed, no matter how simple you think it is to understand counting.

If you create the opportunity to make a mistake remembering to freshen a nonce, even if that opportunity is remote, such that you'd never trip over it accidentally, you've given attackers a window to elaborately synthesize that accident for you. That's what a vulnerability is.

There is a whole subfield of cryptography right now dedicated to "nonce misuse resistance", motivated entirely by this one problem. This is what I love about cryptography. You could go your entire career in the rest of software security and not come up with a single new bug class (just instances of bug patterns that people have been finding for years). But cryptography has them growing on trees, and it is early days for figuring out how to weaponize them.

That's why people pay so much attention to stuff like nonce widths.


> GCM's security falls to pieces

That's an exaggeration. Reusing the nonce in GCM allows decryption of messages with the same nonce. It does NOT compromise the key.


Allowing decryption of the messages sounds like falling to pieces to me...


Messages with duplicate nonces. Recovery of the authentication key may also allow message forgery (although it won't allow decryption).


Message forgery does quite often lead to decryption actually - google “chosen ciphertext attack”.


None of the modern symmetric ciphers are susceptible to chosen ciphertext attacks.


Say what now? GCM is itself vulnerable to CCA in a nonce reuse scenario - exactly the subject of this thread. Not to mention padding oracle attacks against CBC mode etc. Almost all modern symmetric ciphers achieve CCA security by combining the cipher with a MAC to create an AEAD mode. So if your AEAD mode gives up the MAC subkey, as GCM does under nonce reuse, then you lose all CCA security, and usually starting leaking details about plaintexts not long after.


Sigh. If you're talking about crypto, then terms actually matter. GCM is not a symmetric cipher.

It's a cipher mode. You can use GCM with any block cipher. OK, I assume that you meant AES-GCM.

But GCM as a construction in itself is not vulnerable to chosen ciphertext attacks, as long as the underlying symmetric cipher is secure.

GCM will lose the authentication property, if you know the authentication key, which you _might_ be able to get if you can mount a chosen _plaintext_ attack under conditions of nonce reuse. Simply getting a couple of random messages with the same nonce is NOT enough.

AES-GCM as specified has a nonce that is large enough to not care about it in practical cases (e.g. TLS), and it can become a problem only in very unrealistic cases (attacker-controlled likely exabyte-sized plaintexts).

These cases are maybe _juuuust_ in the realm of possibility, if you have access to a supercomputer, and you want to specifically design an application that is vulnerable to an attack, and then allow your adversary to covertly connect to your supercomputer cluster. To be clear, we're talking here about repurposing the entire NSA computing and storage power to host this single application, and allowing the attacker (e.g. Russian troll farms) to completely control the plaintexts that it transmits.

Extending the nonce to 256 bits would move that from outside the realm of possibility even for a contrived scenario. It's not a bad idea, but it's also not at all an urgent one.


> Simply getting a couple of random messages with the same nonce is NOT enough.

Yes it is. You simply XOR the two auth tags and then compute the roots of the resulting polynomial (with known coefficients). There typically aren’t that many candidate roots to test. This has been known since GCM was first specified, see eg Joux’s comments: https://csrc.nist.gov/csrc/media/projects/block-cipher-techn...

It’s clear from your comments here and elsewhere that you don’t know what you are talking about, so I’ll take tptacek’s advice and bow out here.


The claim you're responding to doesn't even make sense, so I don't think you're obligated to reply to it. :)


I know that you're a Google U alumni, but can you give me an example of a modern symmetric cipher that is susceptible to a chosen ciphertext attack?



Yes, and? It allows recovery of the authentication key, but not the source AES key.

The authentication key is _derived_ from the AES key, but they're not the same.


I don't think you can walk back your previous comment, which was pretty categorical. Either way, we're clear about the brittleness of GCM at this point, and there's little else for us to talk about.


Brittleness? Not really. It's not completely future-proof, and it would be easier if a larger nonce is standardized, but all realistic attacks require rather unlikely sets of circumstances.

And no, you can't recover the encryption key (i.e. the thing that allows you to decrypt messages) from any weakness in the nonce choice.


https://www.usenix.org/conference/woot16/workshop-program/pr...

I think you should stop digging. Sean and Hanno gave a Black Hat talk whose slides were unwillingly hosted on a GCHQ website because of this problem.


[flagged]


K.


Presumably, if you're going to do an extended-nonce GCM, you could reformat the counter block (since the nonce is encoded in the key anyways) to get rid of the 64Gb limit --- but that complicates the FIPS story, I guess?


Indeed. But FIPS is not the only problem. Both the McGrew/Viega spec and subsequent NIST spec of GCM mandate a 4-byte counter - any departure from that would be "no longer GCM".


Is the argument for a small counter that nobody serious will treat it as a significant diversification component or reliable source of entropy, especially in a streaming mode? It's a counter whose function is necesesarily finite and predictable (and reversible?), if not explicitly linear. Intuitively, any substitutions or convolutions derived from it would weaken subsequent operations, no?


Yep. Gross.


I think the message size limit is a bit of a red herring. Anyone using AES-GCM with messages that large is probably doing sketchy things with unauthenticated plaintext on the decryption side. A non-hazmat/not-just-for-experts cipher really needs to be chunked.


I've seen operating on unauthenticated plaintext enough times to list it as my own pet peeve with AES-GCM. But it's a problem for chunked messages too. A few years ago we released a SCRAM mode that makes very minimal changes to AES-GCM so that it mathematically can't operate on unauthenticated plaintext. https://github.com/aws/s2n-tls/tree/main/scram


> But it's a problem for chunked messages too.

I'm curious to hear more about what you've seen. My naive hope was that a proper streaming decrypt API would be enough of a pit of success that developers wouldn't be tempted to sabotage themselves.


A great amount of modern systems is copying data from A to B to C. The construct of frontends and backends implies it, or middleware, or proxies, or distributed storage, or blockchains. Even in the most complex systems, latency is one of the easiest core metrics to measure, and is always a priority. It is always lower latency to prefetch the inter-system pipelines, or to use optimistic concurrency and to preprocess data before it has been authenticated.

Chunked streaming can make the difference smaller, but even that "small" difference is beyond what is relevant to say ... filling an L1 cache, or waiting a round-trip. Some of the cases of "read before auth" I've seen have been on very small messages, but in contexts where the incentives are even further driven up, like trading or bidding protocols. It just left me thinking that we should enforce AEAD mathematically. Many practitioners often assume it already is enforced!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You