bear_child's comments

bear_child · on July 13, 2018

Can you say more about the last paragraph? I have often wondered if you could avoid identifier mangling with a better designed object file format

comex · on July 13, 2018

Hmm… well, when it comes to identifier mangling, object formats aren't really the problem. Both ELF and Mach-O use nul termination for symbol (and section) names, so they can't contain 00 bytes, but there's nothing in the binary format preventing them from containing any other bytes. So you could make a symbol named

    foo::bar(int,int)

…and most likely, everything that deals with binaries would have no problem with it.

A bigger obstacle might be the assembler, whose input is text. Assembly files usually write symbol names without any escaping or quoting, so non-alphanumeric characters could be misinterpreted. But in fact, it seems that both GNU as and LLVM's assembler (currently used on macOS) allow optionally surrounding symbol names in quotes, allowing those characters to be used:

    "foo::bar(int,int)":
        jmp "foo::bar(int,int)"

Also, it seems that Clang will use this syntax where necessary when generating assembly files. This compiles:

    int bar(int a, int b) asm("foo::bar(int,int)");
    int bar(int a, int b) {
        return a + b;
    }

…but GCC apparently doesn't use it; I just tried it on the latest version of GCC (8.1.0) and it produces assembly that uses the name unquoted, which then makes the assembler spit out errors.

However, I've left one thing out. I think C++ symbol mangling mainly originated as a hack to support existing assemblers, but it also achieves a basic form of compression. For example, keywords are represented by a single character, and there's a "substitution" syntax for reusing the same token sequence more than once. C++ symbol names already tend to be crazy long, and having a less succinct mangling would make them even longer, which would make binaries larger and might make dynamic linking slower – though to be honest, I have no idea how much (if at all) this would be noticeable in the relative scheme of things.

Also, there has to be a single canonical mangling of any given declaration, so even if a platform decided to use C++ syntax directly in symbol names, it would probably omit spaces, unnecessary parentheses, etc., and the result might be harder to read than what you get after demangling. So a demangler might still be desirable. Still, it would certainly be more readable than the current mangling!

But that's all assuming that the overall compilation scheme would still look like today, with a 'dumb' linker that only knows about symbols and assembly code, not types or anything about C++ semantics.

You could go a step further and design an object format with native support for C++, even things like templates. Imagine being able to define a template in one .cpp file, link it into a library, and then instantiate that template from another executable! That would be enormously cool. In fact, the C++ spec used to define an 'export template' syntax that was supposed to do this, but essentially no compilers implemented it, and it was removed in C++11. (C++ modules are also kind of a form of this, but they're meant to be compiler-specific, private build artifacts rather than something defined at the system level.)

I can think of three distinct drawbacks, though:

1. C++ template semantics are very tightly bound to its syntax; there's little you can say about a template definition without knowing what it's instantiated with. Indeed, if you're going to encode template definitions in object files, the format would probably be nothing more complicated than pre-tokenized source code. More modern languages do this somewhat better – in fact, Swift actually plans to have a stable ABI for generics.

2. Similarly, C++ template semantics are very C++-specific; other languages would probably require separate support in the format rather than being able to reuse the C++ functionality. In comparison, existing 'dumb' object formats are basically language-agnostic.

3. The biggest problem: If you allowed templates to be exported from dynamic libraries, the dynamic linker's functionality would have to be transformed from a series of quick name lookups and fixups that can be done every time a binary is launched, to a full-fledged C++ compiler with an expensive code generation step. Even if you cached the output it would still be slow on first launch, especially on low-powered platforms like mobile devices…

And yet despite all those drawbacks, I still dream of a system that has… at least some form of this. (I've thought a bit about possible designs: perhaps it could be designed as a component of the package manager rather than of the linker directly.) Why?

Well, Debian just started packaging Rust code, and look at how that's going. Each library package ("crate") gets a libfoo-dev that just contains a copy of the package source code, with no libfoo binary package; each executable is statically linked, and a new package version will be released whenever any of its dependencies update. Which is going to mean a lot of redundant upgrades. That's Rust, not C++, but to the extent C++ libraries avoid this problem, it's usually by eschewing templates altogether for anything that's meant to have a stable ABI. If the API does include templates (at least ones that clients instantiate with their own parameters), then clients have to be rebuilt whenever the library changes, same as Rust. I find this quite annoying, since I think the future should be full of ergonomic, strongly-typed APIs taking full advantage of the features of modern languages… yet I don't want to burden sysadmins with pointless upgrades. :) And it just feels wrong that linkers have basically never progressed beyond C.

jstimpfle · on July 13, 2018

Why would you want run-time code generation for things that you can also generate at compile time? A linker is there to link, not to compile.

One of the big problems with templates is that you're not required to instantiate them explicitly. So you end up with lots of duplicate instantiations. Maybe try doing it explicitly, always - have a well defined place where the template is instantiated. You can also put the compiled code in a library, I think. (There seems to be an extern template feature since C++11).

Anyways, templates are a mess...

bear_child · on May 22, 2018

Those videos look cool! I don't know anything about instruction pipelines and stuff like that. Will def check it out

corysama · on May 22, 2018

Also be sure to check out https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

bear_child · on March 17, 2017

That is a good point! For me, if I click the step button, I can then hold down enter and it starts steping very fast. That may be browser specific though

bear_child · on Jan 3, 2017

https://www.amazon.com/Quantum-Computation-Information-10th-...

is the standard reference. It is well written and light if you have the math background (linear algebra + probability theory)

bear_child · on Jan 3, 2017

Regardless of how we end up building quantum computers, they will be useful. The theory behind computing with qbits is well established and many interesting quantum algorithms have already been developed.

Even a 32 qbit computer would be a very serious breakthrough.

edit: by a 32 qbit computer, I mean that the device should have 32 qbits of memory in total! I'm not talking about bus size.

That is one of the cool things about quantum computing. You don't need a big device to do interesting computations.

bear_child · on Dec 22, 2016

What is your company? I am Australian, finishing a PhD in mathematics and looking for a job.

nl · on Dec 26, 2016

Email me. Contact details in my profile.

HN For You