Love Pratt parsing! Not a compiler guy, but I've spent way too many hours reflecting on parsing. I remember trying to get though the dragon book so many times and reading all about formal grammar etc. Until I landed on; recursive descent parsing + Pratt for expressions. Super simple technique, and for me is sufficient. I'm sure it doesn't cover all cases, but just for toy languages it feels like we can usually do everything with 2-token lookahead.
Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)
It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.
Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.
For parsing: by default you should be using parser combinators.
Is there a production compiler out there that doesn't use recursive descent, preferably constructed from combinators? Table-driven parsers seem now to be a "tell" of an old compiler or a hobby project.
Some people appreciate that an LR/LALR parser generator can prove non-ambiguity and linear time parse-ability of a grammar. A couple of examples are the creator of the Oil shell, and one of the guys responsible for Rust.
It does make me wonder though about why grammars have to be so complicated that such high-powered tools are needed. Isn't the gist of LR/LALR that the states of an automaton that can parse CFGs can be serialised to strings, and the set of those strings forms a regular language? Once you have that, many desirable "infinitary" properties of a parsing automaton can be automatically checked in finite time. LR and LALR fall out of this, in some way.
Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.
Oh, I was talking much more about how you can first learn how to write a compiler. I wasn't talking about how you write a production industry-strength compiler.
Btw, I mentioned parser combinators: those are basically just a front-end. Similar to regular expressions. The implementation can be all kinds of things, eg could be recursive descent or a table or backtracking or whatever. (Even finite automata, if your combinators are suitably restricted.)
I used a small custom parser combinator library to parse Fortran from raw characters (since tokenization is so context-dependent), and it's worked well.
The thing about LR parsers is that since it is parsing bottom-up, you have no idea what larger syntactic structure is being built, so error recovery is ugly, and giving the user a sensible error message is a fool’s errand.
In the end, all the hard work in a compiler is in the back-end optimization phases. Put your mental energy there.
I have worked on compilers (mostly) for high-performance computing for over 40 years, writing every part of a production compiler twice or more. Optimization and code generation and register allocation/scheduling are definitely the most fun -- but the hardest work is in parsing and semantics, where "hardest" means it takes the most work to get things right for the language and to deal with user errors in the most graceful and informative manner. This is especially true for badly specified legacy languages like Fortran.
The Dragon book wasn't good for me either but I'd disagree about using parser combinators. The problem that I'd see the Dragon book having is basically starting to use concepts (phases of compilation) before it introduces and motivates them in the abstract. I can see how people who already know these concepts can look at the Dragon book and say "oh, that's a good treatment of this" so perhaps it's good reference but it's problematic for a class and terrible to pick up and try to read as a stand alone (which I did back in Berkeley in the 80s).
As far as I can tell, parser combinators are just one way that promises to let "write a compiler without understanding abstract languages" but all these methods actually wind-up being libraries that are far complicated than gp's "recursive descent + pratt parsing", which is easy once you understand the idea of an abstract language.
I wrote a bunch of parser combinator libraries myself, and used plenty more.
They are basically the same idea as regular expressions, but more flexible: you have a bunch of combining operations for your regular languages to build bigger regular languages. But that doesn't tell you how it's implemented in the backend. Could be Recursive, could be automata, could be backtracking, could be anything.
If you want to write your first compiler, I'd even go so far as to suggest to use something with a Lisp syntax as your source language, explicitly so you can minimise the parsing aspect.
Parsing is a lot of fun by itself, but it doesn't actually have much to do with the core of what makes compilers interesting and challenging. It's almost an independent pursuit, and very useful outside of writing compilers, too.
> As far as I can tell, parser combinators are just one way that promises to let "write a compiler without understanding abstract languages" but all these methods actually wind-up being libraries that are far complicated than gp's "recursive descent + pratt parsing", which is easy once you understand the idea of an abstract language.
Where do you suspect the complexity here? You can write a toy library for parser combinators that's really simple, if your implementation language is at least as capable as Rust or even Python (with Haskell and OCaml being probably the easiest). If you are using an off-the-shelf industrial-stength, production-grade parser combinator library: of course, that's complicated under the hood. That's the price the authors wilingly pay for great error handling and performance etc.
For most people writing their first compilers, they would better off starting the project from when they already have some tree or DAG representation. Plenty of (more!) interesting challenges left. Going from stream of bits to the parsed syntax tree is something they can learn about later (or not at all), without missing much.
I was just going into the second quarter of compiler design when the dragon book came out. My copy was still literally “hot of the press” — still warm from the ink baking ovens. It was worlds better that anything else available at the time.
Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)
Well, it depends how formal you're talking about. I have to say that the standard you mention, recursive descent parsing + Pratt for expressions. actually requires you to understand what a formal language is - that it's a "thing" that can't (or shouldn't) be an object or a data structure but exists abstractly before any objects created by the program.
Moreover, the standard way of producing a recursive descend parser is to begin with your language in Chomsky normal form or some human understandable format and then convert to Greibach Normal form and that specification converts readily to your series of recursive functions. So all language transforms are useful to know (though you can skip steps if you have a good intuition of your language).
Quick other one: To parse infix expressions, every time you see "x·y | (z | w)", find the operator of least binding power: In my example, I've given "|" less binding power than "·". Anyway, this visually breaks the expression into two halves: "x·y" and "(z | w)". Recursively parse those two subexpressions. Essentially, that's it.
The symbols "·" and "|" don't mean anything - I've chosen them to be visually intuitive: The "|" is supposed to look like a physical divider. Also, bracketed expressions "(...)" or "{...}" should be parsed first.
Wikipedia mentions that a variant of this got used in FORTRAN I. You could also speed up my naive O(n^2) approach by using Cartesian trees, which you can build using something suspiciously resembling precedence climbing.
Until you need to do more than all-or-nothing parsing :) see tree-sitter for example, or any other efficient LSP implementation of incremental parsing.
It is easily possible to parse at > 1MM lines per second with a well designed grammar and handwritten parser. If I'm editing a file with 100k+ lines, I likely have much bigger problems than the need for incremental parsing.
It's not just speed - incremental parsing allows for better error recovery. In practice, this means that your editor can highlight the code as-you-type, even though what you're typing has broken the parse tree (especially the code after your edit point).
I am a compiler guy, and I completely agree. Parsing is not that hard and not that important. Recursive descent + pratt expressions is almost always the practical choice.
I think even the theory of Regular Languages is somewhat overdone: You can get the essence of what NFAs are without really needing NFAs. You can get O(n) string matching without formally implementing NFAs, or using any other formal model like regex-derivatives. In fact, thinking in terms of NFAs makes it harder to see how to implement negation (or "complement" if you prefer to call it that) efficiently. It's still only linear time!
The need for NFA/DFA/derivative models is mostly unnecessary because ultimately, REG is just DSPACE(O(1)). That's it. Thinking in any other way is confusing the map with the territory. Furthermore, REG is extremely robust, because we also have REG = DSPACE(o(log log n)) = NSPACE(o(log log n)) = 1-DSPACE(o(log n)). For help with the notation, see here: https://en.wikipedia.org/wiki/DSPACE
Professional compiler writer here. All you really need to use is a recursive descent parser. Very easy to understand. Very easy to implement. While also being very powerful.
Of course :^) I'm close to jumping ship to GrapheneOS, but as a Swedish resident I really need our digital id services, digital mailbox, and banking apps. I have seen their page on app support, but I am slightly afraid its not up to date / will break any time. I guess the solution is to use one banking android phone and one GrapheneOS for everyday use.
I just have an old phone for all the banking stuff. And I use degoogled phones for real stuff. I don't need my bank when I'm out anyway.
Not using grapheneos though because pixels are expensive in my country. Also, I disagree with them on some points, like rooting. I don't think me having access to root makes my phone less secure. Obviously it should be secured properly so only I can use it, but that can be done. After all even an unrooted phone still has a root account and runs stuff as root, you just can't access it as a user. That means the OS vendor (grapheneos in this case) has more access rights on my phone than me (how else are they going to install updates), to me that's not right.
I just want to be able to inspect what is going on on my phone. What apps are storing about me on their private storage, and to be able to add root CAs so I can MITM their traffic to inspect it.
I believe GrapheneOS would only be an issue if the Swedish gov decides on using the Google Play Integrity API instead of Android's hardware attestation API (and requiring their apps to whitelist GrapheneOS's keys). So their stance doesn't really change much in terms of how banking apps currently work with GrapheneOS.
The Play Integrity API even works on GrapheneOS, but will only pass basic integrity (which is enough for most, but not all banking apps). It doesn't pass strong integrity, which does remote attestation. If your bank does that, ask them to add remote attestation for GrapheneOS as well.
For most apps, yes, they won't require the MEETS_STRONG_INTEGRITY check in the Google Play Integrity API. But if your apps _do_ choose to use that Google Play Integrity API for a strong integrity check, then they won't be able to whitelist GrapheneOS's keys for it to pass. Unless you can convince Google to whitelist them.
Thus it's best if they use Android's hardware attestation API instead, as you can then decide to whitelist GrapheneOS to pass that strong integrity check.
Another Swedish resident here, using GOS for around 5 years.
So far all the dealbreaker stuff works (BankID, Swish, bank apps, transport apps, etc.) which is great.
That said, I also work in Denmark and need the Danish apps. And the situation in Denmark was the same as Sweden... until one day it wasn't. For example, MitID flipped a switch one day and started enforcing Play Integrity. It became impossible to activate MitID on a GOS phone. And it kinda became the new normal in government or -adjacent apps.
Therefore, I dread the day this might happen in Sweden too. Let us see what will happen with the digital wallet app that the government will launch to compete with BankID. I am afraid there is a good chance that they will tread the same path... I hope I am wrong about that.
It's not an issue, we're just spoiled. It's such an amazing convenience that anything else seems like a huge and unnecessary hassle.
There is actually more a second MFA provider that is accepted almost everywhere, including the tax authority. I forget it's name and I've never tried it, so I can't say too much, but presumably it provides similar functionality as BankID
It's called Freja.
It's also possible to get a special hardware device to do the bankID dance, which is great to have if your phone breaks, as having that device will make it possible to provision a new bankID without visiting a bank office.
He's referring mostly to BankID which is a very secure MFA solution designed for banking purposes(all banks in Sweden accept the same mfa app) the inbox app is probably kivra, which is a email inbox which uses BankID for authentication, and is used for invoices and other "official business" mails.
There's also swish, which is instant payments to both friends and businesses. Swish also uses BankID.
BankID is also used to sign documents, file taxes, etc.etc.
Swedish society is largely built around this one official MFA solution, and having a phone where you cannot run it is a real hassle
I can only speak for my bank (Nordea), but they do offer a separate 2FA device you can order if you "can't use" your smartphone for whatever reason. As a solution it sucks, but technically you're not forced to use a mobile phone to login. I'd be surprised if other banks didn't offer similar fallbacks.
You would need to lug the device with you everywhere because BankID is used for all sort of things in Sweden. I couldn't even use a vending machine here without the BankID app.
Ah, thank you for the clarification! Does not really work in all countries, e.g. here it is quite common at events to pay through a QR code and you need your banking app to do so.
Because you have motivated reasoning to dislike these companies, even though Blackrock and Blackstone are bog standard financial services companies and a random naming scheme is easy to grab onto.
All the worst companies seem to all be LOTR themed.
Technically the Palantiri were a force for good in the hands of Elves and Men, and could still be used for good, like Aragorn using it to challenge Sauron and forcing Sauron’s hand. So that’s a defense to the self-awareness argument. In fact that ambiguity is likely intentional.
Btw I always wondered why I was seeing droves of Palantir swag on Stanford campus back in early 2010s. I wouldn’t wear something that has a 50%+ chance of being interpreted as evil.
The Palantir themselves aren’t evil, they were made by the elves long before the events of LOTR. Essentially they are just a tool.
However I heard that Thiels favourite book is the rewrite of LOTR from the perspective of Sauron, where Gandalf and the elves seek to destroy humanity and technology (at least that’s how I understood the gist, haven’t read it)
Pretty much private mercenaries that work outside of the usual army structure as "private contractors". They're usually the ones the US contracts to do the worst atrocities, as that gives the government a thin veneer of plausible deniability because they were behaving "independently". The US also does its best to make sure they never face any legal consequences for their war crimes.
Also worth pointing out that, due to this "contractor" relationship, they never count towards official casualty figures. For example, if Iran were to kill 50k of them (I'm of course exaggerating to make a point), they wouldn't count towards US casualty figures, so it's also a way for the government to downplay the effects of foreign intervention to the general public.
> Also worth pointing out that, due to this "contractor" relationship, they never count towards official casualty figures. For example, if Iran were to kill 50k of them (I'm of course exaggerating to make a point), they wouldn't count towards US casualty figures, so it's also a way for the government to downplay the effects of foreign intervention to the general public.
This has happened throughout history in war, before even recorded history.
Russia is doing it right now with North Koreans. Also with Wagner Group, until they had their little uprising against Putin and then their plane happened to crash.
Bit tangental, but if this was a real thing, we could hopefully stop letting google / microsoft determine whats spam. Private mail servers would hopefully more common and actually work. Super annoyed, I use cloudflare + protonmail for my custom domain, but I have the feeling that some outgoing emails from my domain gets blocked... 90% deliverability means practically useless.
Strictly speaking, Isn't there still a way to express at least one Illegal string in ArchivedString? I'm not sure how to hint to the Rust compiler which values are illegal, but if the inline length (at most 15 characers) is aliased to the pointer string length (assume little-endian), wouldnt {ptr: null, len: 16} and {inline_data: {0...}, len: 16} both technically be an illegal value?
I'm not saying this is better than your solution, just curious :^)
In the code you will find union { {len, relptr}, [u8; 16] }
The length is first. The pointer second. The inline string is terminated with 0xFF. The length is 62 bits out of 64 bits such that a specific pattern is placed in the first byte that utf8 doesn't collide with.
Super cool stuff! I love the idea of games being refurbished to the point that it can be kept, almost source original, and still played years down the line. For example, I love Another World for this, being just a bytecode blob where each port is just a VM (good writeup: https://fabiensanglard.net/another_world_polygons/index.html).
Location: Umeå, Sweden
Remote: Yes (preferred)
Willing to relocate: In Sweden, perhaps
Technologies: C++, Java, Golang, Low-level systems engineering, GPU Compute & Rendering (Vulkan, GL, Cuda)
Résumé/CV: https://github.com/ollelogdahl https://www.linkedin.com/in/ollelogdahl
Email: olle.logdahl.net
Graduating with my Master's this summer (June), seeking new exciting & challenging roles where I can grow. Been working 4.5 years as a Java backend developer, but have spent 10+ years with CS as my hobby; exploring distributed systems, systems programming and computer graphics.
My current interests are mostly systems programming, performance engineering and GPU compute. I love to tackle hard technical topics and deep diving. If you're working on something interesting and think I could contribute, please reach out! :^)
Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)
reply