For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more jjmarr's commentsregister

If you're doing single-core builds, you will get impressive speedups from unity builds.

This is because C++ compilers spends a lot of time redundantly parsing the same headers included in different .cpp files.

Normally, you get enough gains from compiling each .cpp file in parallel that it outweighs the parsing, but if you're artificially limited in parallelism then unity builds can pay for themselves very quickly as they did in the article.

C++20 modules try to split the difference by parsing each header once into a precompiled module, allowing it to reuse work across different .cpp files.

Unfortunately, it means C++ compilation isn't embarrassingly parallel, which is why we have slow buildsystem adoption.


The problem is that due to how templates work, each compilation unit will end up with its own copy of templated function, which creates extra work, code bloat etc.

The compiler also doesn't really inline or optimize functions as well across object boundaries without link-time optimization.

But the linker is single threaded and notoriously slow - with LTO, I wouldnt be surprised it would take up as much time as the whole unity build, and the results are often suboptimal.

Also, C++ syntax is notoriously hard and slow to parse, the clang frontend takes almost as much time to run as LLVM itself.

So probably modules would help a lot with parallel parsing, but that would help unity builds just as much.


> each compilation unit will end up with its own copy of templated function, which creates extra work, code bloat etc.

Yes, that's what causes the parsing bottleneck. Unity builds don't need to create multiple copies of templated functions.

C++20 modules could fix that because the function is parsed before substitution. Tbd on if that optimization works yet, I tried it on Clang 18 and it didn't.

> But the linker is single threaded and notoriously slow

I think most linkers have parallel LTO and `mold` provides actual parallel linking.


> But I still do not understand how one can consider writing to memory the OS owns to be ok.

Your manager tells you to reduce memory usage of the program "or else".


TBH i think a more likely explanation is that they needed to somehow identify separate instances of that data structure and they thought to store some ID or something in it so that when they encountered it next they'd be able to do that without keeping copies of all the data in it and then comparing their data with the system's.


^^ The voice of experience, here.


Or you desperately need to tag some system object and the system provides no legitimate means to do so. That can be invaluable when troubleshooting things, or even just understanding how things work when the system fails to document behavior or unreasonably conceals things.

I've been there and done it, and I offer no apologies. The platform preferred and the requirements demanded by The Powers That Be were not my fault.


"Stable ABI" is a joke in C++ because you can't keep ABI and change the implementation of a templated function, which blocks improvements to the standard library.

In C, ABI = API because the declaration of a function contains the name and arguments, which is all the info needed to use it. You can swap out the definition without affecting callers.

That's why Rust allows a stable C-style ABI; the definition of a function declared in C doesn't have to be in C!

But in a C++-style templated function, the caller needs access to the definition to do template substitution. If you change the definition, you need to recompile calling code i.e. ABI breakage.

If you don't recompile calling code and link with other libraries that are using the new definition, you'll violate the one-definition rule (ODR).

This is bad because duplicate template functions are pruned at link-time for size reasons. So it's a mystery as to what definition you'll get. Your code will break in mysterious ways.

This means the C++ committee can never change the implementation of a standardized templated class or function. The only time they did was a minor optimization to std::string in 2011 and it was such a catastrophe they never did it again.

That is why Rust will not support stable ABIs for any of its features relying on generic types. It is impossible to keep the ABI stable and optimize an implementation.


C++ builds are extremely slow because they are not correct.

I'm doing a migration of a large codebase from local builds to remote execution and I constantly have bugs with mystery shared library dependencies implicitly pulled from the environment.

This is extremely tricky because if you run an executable without its shared library, you get "file not found" with no explanation. Even AI doesn't understand this error.


The dynamic linker can clearly tell you where it looks for files and in which order, and where it finds them if it does.

You can also very easily harden this if you somehow don't want to capture libraries from outside certain paths.

You can even build the compiler in such a way that every binary it produces has a built-in RPATH if you want to force certain locations.


That is what I'm doing so I can get distributed builds working. It sucks and has taken me days of work.


It's pretty simple and works reliably as specified.

I can only infer that your lack of familiarity was what made it take so long.

Rebuilding GCC with specs does take forever, and building GCC is in general quite painful, but you could also use patchelf to modify the binary after the fact (which is what a lot of build systems do).


> I can only infer that your lack of familiarity was what made it take so long

Pretty much.

Trying to convert an existing build that doesn't explicitly declare object dependencies is painful. Rust does it properly by default.

For example, I'm discovering our clang toolchain has a transitive dependency on a gcc toolchain.


Clang cannot bootstrap in the same way GCC can; you need GCC (or another clang) to build it. You can obviously build it twice to have it be built by itself (bear in mind some of the clang components already do this, because they have to be built by clang).

In general though, a clang install will still depend on libstdc++, libgcc, GCC crtbegin.o and binutils (at least on Linux), which is typically why it will refer to a specific GCC install even after being built.

There are of course ways to use clang without any GCC runtime, but that's more involved and non-standard (unless you're on Mac).

And there is also the libc dependency (and all sysroot aspects in general) and while that is usually considered completely separate from GCC, the filesystem location and how it is found is often tied to how GCC is configured.


You don't on new projects. CMake + ninja has support for modules on gcc, clang, and MSVC.

This should be your default stack on any small-to-medium sized C++ project.

Bazel, the default pick for very large codebases, also has support for C++20 modules.


I have yet to see modules in the wild. What I have seen extensively are header-only projects.


It's the fault of built systems. CMake still doesn't support `import std` officially and undocumented things are done in the ecosystem [1]

But once it works and you setup the new stuff, having started a new CPP26 Project with modules now, it's kinda awesome. I'm certainly never going back. The big compilers are also retroactively adding `import std` to CPP20, so support is widening.

[1] https://gitlab.kitware.com/cmake/cmake/-/work_items/27706


I wanted to ship import std in 4.3 but there are some major disagreements over where the std.o symbols are supposed to come from.

Clang says "we don't need them", GCC says "we'll ship them in libstdc++", and MSVC says "you are supposed to provide them".

I didn't know about that when I was working on finishing import std for CMake and accidentally broke a lot of code in the move to a native implementation of the module manifest format, so everything got reverted and put back into experimental.


That's really interesting info, thanks!


weird to blame build systems for a problem caused by the language


You are of course right. It's just that Modules inherently put a lot of responsibility on the build system. Among those, but not limited to: a "module registry" wasn't standardized and is in the hands of the build system.

Systems like ninja needs to know modules, which took time and then a stack further up systems like CMake needed to know modules, which took time. That's my answer to the parent "why are there so few modules projects". Because it took time for the ecosystem to catch up.


You're not supposed to distribute the precompiled module file. You are supposed to distribute the source code of the module.

Header-only projects are the best to convert to modules because you can put the implementation of a module in a "private module fragment" in that same file and make it invisible to users.

That prevents the compile-time bloat many header-only dependencies add. It also avoids distributing a `.cpp` file that has to be compiled and linked separately, which is why so many projects are header-only.


What I mean is, I have yet to see projects in the wild _use modules at all_.


Plenty of examples on Github, Microsoft has talks on how Office has migrated to modules, and the Vulkan updated tutorials from Khronos, have an optional learning path with modules.


Modules need a lot of tooling. The tool vendors have been working hard on this for years. They have only just now said this is ready for early adopters. Most people are waiting for the early adopters to write the books on what best practices are - this needs a few more years of experience.


if something so simple needs years of experience it's poorly designed


Modules are not simple. They sound simple only to people who have never digged into them.


I've worked extensively on module/import semantics for multiple products in my life. It is complex. However this complexity is on the implementer and not the user.

If "best practices" need to be refined over years, it is poorly designed. This is not untrodden ground, other languages and ecosystems do sane things.


This was considered during standardization. The feeling among tool developers at the time was it was "close enough" to Fortran modules to be mostly solvable.

This was wrong, mostly because C++ compiler flag semantics are far more complicated than in Fortran, you live and you learn. The bones of most implementations is identical to Fortran though, we got a ~3 year head start on the work because of that.

Ninja already had the dyndep patch ready to go from Fortran, CMake knew basically how to use scanners in build steps. However, it took longer than expected to get scanner support into the compilers, which then delayed everything downstream. Understanding when BMIs need to be rebuilt is still tricky. Packaging formats needed to be updated to understand module maps, etc, etc.

Each step took a little longer than was initially hoped, and delays snowballed a bit. We'll get there.


Thanks. It's been a long time since I started a C++ project, and I've never set up any build chain in Visual Studio or Xcode other than the default.


How about using Zig to build C++ projects?


I haven't used it.

That being said, while it looks better than CMake, for anything professional I need remote execution support to deviate from the industry standard. Zig doesn't have that.

This is because large C++ projects reach a point where they cannot be compiled locally if they use the full language. e.g. Multi-hour Chromium builds.


Surely Zig can also be invoked using any CI/CD flow running on a remote machine too.


I'm referring to this:

https://github.com/bazelbuild/remote-apis

Once you get a very large C++ project with several thousand compilation jobs over hundreds of devs, you need to distribute the build across multiple computers and have a shared cache for object files.

Zig doesn't seem to support that.


No, because most major compilers don't support header units, much less standard library header units from C++26.

What'll spur adoption is cmake adopting Clang's two-step compilation model that increases performance.

At that point every project will migrate overnight for the huge build time impact since it'll avoid redundant preprocessing. Right now, the loss of parallelism ruins adoption too much.


It might be because there's a person in the photo, and France is very strict on photographing people.

https://commons.wikimedia.org/wiki/Commons:Country_specific_...

In terms of the formatting/brevity, Reuters was originally a wire service. They'd cover news in foreign locations and send it by telegraphic wire to local newspapers that would license the content.

Telegraphs charged by the word and didn't have letter case. Cryptic in-band signals like "NO USE FRANCE" are a relic of that time.

Since the link OP posted is to the B2B part of Reuters, I'm assuming they still haven't modernized this system.


It doesn't seem to be about photographing people, other pictures don't feature people and still have the "NO USE FRANCE" tag. It seems like all pictures by Chris Jung have the "NO USE FRANCE" tag.

My best guess is that Chris Jung has some kind of an exclusivity contract for publishing in France. Looking at his website, he publishes in "Paris Match", a French magazine, so it may be related.


That makes more sense.


You folks are amazing, thanks for catching that. My curiosity is soothed!


This is the traditional "innovators dillema" where a skilled profession facing an imperfect technological threat decides not to adopt it until it is too late.

AI generated articles are, on the balance, inferior, except for people that want simple, low quality content.

But LLMs are moving up the value chain with Deep Research. They can give explanations tuned to a reader's knowledge/viewpoints and provide interactive content Wikipedia doesn't support. That is a killer app for math/science topics.

Wikipedia will win against a generic corporate encyclopedia on neutrality/oversight, but it'll lose badly on UX, which is what matters.

I think the tipping point will be direct integration of academic sources into ChatGPT/Claude/Gemini and a "WikiLink" type way to discover interesting follow-up topics.

I can't trust AI answers for serious historical or social science topics because of the first. And generally my chat with AI ends once I get the answer I need because I can't get rabbitholed into other topics.


It REALLY depends on how you're using the AI. I get the strong impression a lot of people are still at the "I'll write a few prompts and see what happens" stage, and hoping for an answer from the magical oracle; as opposed to really using the tool. This never fails to disappoint.

I might be slightly wrong, but probably not by a lot, yet. Sure there's an element of "holding-it-wrong-ism" in my position. But ... it does actually take practice to get it right, and best practices are badly documented!

That said the situation is changing rapidly: https://news.ycombinator.com/item?id=47547849 "AI bug reports went from junk to legit overnight, says Linux kernel czar"

--


Most Wikipedia work is taking paywalled academic content and summarizing it in an encyclopedic format.

For programming, agentic AI can find most of what it needs because everything is open access on Arxiv, blogs, or in the codebase itself. That's why it can "magical oracle" answer questions that were limited to good prompting.

For most other professional topics, citations are locked behind paywalls. Wikipedia editors get free access to academic libraries, but the readers don't. That's why consumer tools suck.

When the big AI companies integrate with proprietary databases in fields like history or social sciences is the time when Wikipedia dies for answering questions.


it’s not supposed to win on UX, it’s current UX is maybe too conservative sure

of course they banned ai they could barely allow css


r/AmItheAsshole is biased towards breaking off relationships rather than fixing them. They also hate social obligations.

e.g. If the OP is asking "I ghosted my friend in AA who insulted me during a relapse", Reddit would say NTA in a heartbeat, while the real world would tell OP to be more forgiving.

On the contrary, if the post was "the other kids at school refuse to play with my child", Reddit would say YTA because the child must've done something to incite being cut off.


Absolutely. I wonder how many parents have been no contacted, SOs broken off with, friendships broken because of the Reddit hivemind's attitude. Pretty sure it's doing a huge amount of societal damage.


I wouldn't blame reddit, it's what you get when you ask several thousand teenagers to give collective relationship advice.


“I got divorced based on advice from complete strangers on the internet, AITA?”


Is it hivemind or just people being generally aware better of toxicity in their lives?


> e.g. If the OP is asking "I ghosted my friend in AA who insulted me during a relapse", Reddit would say NTA in a heartbeat, while the real world would tell OP to be more forgiving.

That’s a nuanced discussion. It depends on what you value most, not what “real world” tells you. Most of the time Reddit would be right, because you need to prioritize yourself instead of continuing toxic relationships.


1) Reddit is horrible at nuance, almost non existent in some subs.

2) The toxicity is being defined by reddit to give the advice which is mostly wrong as outlined above.

If OPs had a understanding of what they valued and what is toxic, they probably wouldn't need a advice from biased readers [biased in the sense that they're on that sub].


That’s true, but they still might be right for wrong reasons.


[deleted]


Ban them too. Make bets have to be placed in person.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You