On C++ code bloat

Foreword by Alexey Ershov:
After publishing my post about C++'s new standard, in which slow compilation time is highlighted as the most bothersome C++ feature, one of our developers responded with a link to his post from 2018. This post is more technically involved with the problems that code bloat have on slow compilation times. It focuses primarily on the negative effect of extensive templates, headers, and inlines on compile times, but then also provides the programming practices to overcome these problems. Read it here now.


The C++ language is known for its slow build times. This problem is not present in the world of pure C, which might give a hint that the problem is caused by some C++ feature. In fact, it is caused by bad habit or writing code in headers, severely worsened by C++ templates and inspired by STL itself.

Headers

The first reason of code bloat is the linking model, which is inherited directly from C. Suppose that a function or a method is defined in header file like this:

C++ code_bloat

C++ code_bloat

Image source: xkcd.com/303

Both calcSum and Foo::run are functions defined in header file. So each of them will be compiled separately in every source file where it is used (directly or indirectly). In some cases (dllexport, virtual method, or just old MSVC) such functions are compiled in every source file which includes this header, even if they are not used! One particularly nasty type of header-defined functions is comprised of methods generated implicitly by the compiler, i.e. destructor, copying, constructors. Last but not least, a dynamic class also needs virtual functions table and RTTI data, and every source file where at least one header-defined constructor is called (including implicitly generated one) also forces compiler to generate all this data.

When the project is large, it contains hundreds of source files, so every popular function defined in header may easily be compiled hundreds of times. One may hope that compiler is clever, that it knows that these functions are same in every source file, and that it manages to do all the hard work only once. But I have never heard of anything like this. Even though linker will merge all the duplicate symbols across .obj files into one, you will get enough time for some swordsplay before it happens =).

Templates

And now we come to C++ templates. Almost every modern language supports generic programming of some sort, but C++ calls this feature "templates" instead of "generics". C++ templates work like a very advanced code generator integrated into the language: every instance of a template is a completely separate piece of code generated by compiler (although linker can merge some methods if their machine code is exactly the same). Consider an example:

C++ code_bloat

Suppose that the program has only one source file. Then function calcSum is compiled once per each used type of template parameter T. The function Bar::run is compiled as many times as there are <T, N> pairs for which the method run is called. This particular case can be quite disastrous because size N can vary a lot. There are some techniques to diminish the problem (e.g. use type-erased base class), but they are not simple.

Also, the aforementioned issue with headers also applies to templates, because templates are almost always defined in header files. In fact, there are two only ways to avoid duplication of template functions across translation units:

  1. Define template functions in source file and explicitly instantiate them there. Unfortunately, it is possible only if you can enumerate all the wanted instances of the template in this source file. Surely, this approach cannot be applied to container classes like std::vector.
  2. (C++11) Use extern template declarations in header, combined with explicit template instantiations in source files. This can work for class templates only if methods are defined outside of the class body. Needless to say, today everyone has a habit of defining methods inside class body. In particular, this is how STL is usually implemented, so extern templates won't help with STL.
Although both reasons of template code bloat can be addressed to greatly reduce the issue, almost nobody does so. Because the code written without concern works perfectly well! It is smaller, simpler, faster to write and easier to read, and easier to maintain. Why bother wasting time on something when it is not necessary? And when the project grows large enough that build times start causing pain, it may be too late already to change your mind.

Questionable solutions

Some people think that precompiled headers is the way to reduce code bloat. This is not the case: precompiled headers are used to reduce the cost of parsing header files many times. This is yet another inefficiency in C++ linking model: each header has to be parsed once again for every translation unit it is included in. The precompiled header may remove some of code bloat in MSVC, but it definitely won't help with functions which are used only after PCH ends.

There are some hopes that not-yet-standardized C++ modules will fix the problem. As it seems now, upcoming C++ modules are simply modular precompiled headers. They will definitely solve problems with parsing, but I'm not so sure about code bloat. The latest draft (N4465, section 4.10) has some rules for instantiations of exported templates, which looks like an improvement from what we have know. The time will show, I guess.

The most widely used solution to fight code bloat (across translation units) is to use unity builds, i.e. merge all source files into one. In other words, instead of horrendously abusing linking model, people simply throw it away. Of course, this solution backfires at the speed of incremental builds.

Binary size

Often when people hear "code bloat", they think only about the size of the resulting binary. Things are much better in this regard: linker merges all occurences of the same symbol across the whole module, so at most one copy of function's machine code remains, even if the function is defined in header. Moreover, as mentioned above, different functions with bytewise equal machine code are often merged together too. So the trashpile of a gigabyte of object files often links into 10 MB binary, and it looks like there is no problem at all. However, things are not so perfect.

First of all, modern programs are often split into many dynamic libraries. The funniest reason for doing this is that sometimes linker simply cannot swallow all the object files in one batch due to their insane size. Each DLL is a separate module and is linked separately, so the merging of symbols does not work across modules. Therefore, in a program with 10 DLLs you can easily find 10 duplicates of std::vector<int> methods in total.

The second problem is the inlining optimization itself. When inlining, compiler takes the code of a function and copies it into the call site. In the worst case, the size of the resulting binary could increase by the size of the function inlined. Compilers are very careful about this, since careless inlining could lead to insane code size growth. All compilers weight carefully potential performance benefits versus code size increase. But while many C++ programmers prefer to think of the optimizer as an omnipotent being which will always do everything in the best way possible, optimizer has very little information and has to resort to simple heuristics when it comes to inlining decision. In a world where half of the code lives in headers and is inlinable, compiler is likely to inline more than necessary. For instance, look at the code of this function:

C++ code_bloat

Among three popular compilers, MSVC 2017 generates 286 bytes of code, GCC 8.1.0 generates 182 bytes, and only Clang 8.0.1 generates 43 bytes and calls non-inlined _Try_emplace method directly (all checked on Windows with optimization enabled). This does not sound too bad, just 100 more bytes, but when such a situation happens at every step, who knows how much bloat is generated in total.

(section added on March 2020)


Conclusion

The problem of C++ code bloat is so severe today, that vast majority of build time is wasted on unnecessary duplicates instead of any useful compiling. And the duplication is caused by C++ linking model not being respected. Any code present in a source file won't cause problems, but every function defined in header is a candidate for unlimited duplication. The templates are especially harmful, since they generate code bloat across two dimensions: across template parameters and across translation units.

In my opinion, the overuse of templates today is caused by modern trends in C++. Moreover, these trends have organically grown up from the Standard Template Library itself, which by the way typically causes a lot of code bloat in any project. Given that rare project is brave enough to throw STL away, it means that code bloat will almost certainly stay with us. Gamedev is perhaps one of the largest areas with enough distrust towards STL, let's hope they will not fall to the trend.

As of now, we can only learn how compilation works and try to avoid too much code bloat. Define stuff in source files as much as possible and prefer virtual methods or type erasure over templates when performance is not critical. If you are still not convinced that the issue is serious, please review the C++ recommendations in Chromium project, the first half of which is solely dedicated to the "don't define stuff in headers" mantra.


Read more:
Fast Debug in Visual C++
How to Hack C++ with Templates and Friends
Which Missing C++ Features Are Needed Most?




Read also: