C++#

This section has notes on working on C++ projects. When working on C++ projects you will typically interact with the following tooling:

  • a compiler (and linker)

  • a build configuration tool

  • a package manager

Compilers#

How They Work#

C++ is compiled language, meaning its source files are transformed to executable machine code all at the same time and in advance of the program launching.

When building a smaller C++ project the project source files (often .cpp or .cc extension) with the support of header (.h) files, are converted to machine code, which is stored in a binary ‘executable’ file. On Unix systems this binary file is often in the Executable and Linkable Format (ELF).

C++ programs typically depend on functionality in external libraries, including the C++ Standard Library. When compiling C++ code the headers from the external library are used to give information on interfaces, types and their sizes that will be consumed from the library. When linking C++, to produce a binary, either the binary code from the external library is copied into the binary being produced (static linking) or a symbol name and path to the external library are copied (shared linking) into the binary - so the executable code can be found at runtime.

In larger C++ projects compiled C++ code is typically stored in an object file .o per translation unit (often a .cpp file) as an intermediary build project. If a library is being produced to allow static linking then the object files are collected together in an archive (.a file), which is also known as a static library. If a library is being produced to allow shared linking then the object files are processed to remove duplication and a processed version is included in a shared library file (.so).

C++ programs can thus be made up of an executable containing code pulled from an external static library, code from its own compiled source files and symbols and paths to code in external shared libraries. The generation of the program is handled by the compiler and linker, which are often exposed via a single program.

There are many ways to go from C++ source code to executable machine code. Modern compilers can optimize the code during machine code generation, flattening loops or avoiding making copies of objects. This comes with the trade-off of longer compile times and the code execution deviating from how it appears verbatim in the source file. Compilers offer different optimization levels to give the user control over this. In addition many other compiler options are available, to control memory safety and security, mathemtical operations and code introspection.

When machine code is added to produced binaries some information from the original source is lost, since it is not needed for the execution and would make the binary larger. However this information may be useful for troubleshooting or debugging a program, so you can check what line an issue happens on, or what the value of a variable is. This is known as ‘debug information’ and can be kept or added to produced binaries through a compiler flag.

Available Compilers#

The two most commonly used C++ compilers on Linux are g++ from the GNU Compiler Collection (GCC) and clang++. Often the compiler is symlinked on Linux systems to c++ as well.

The architecture of the clang compiler differs greatly from the GCC one, it targets an intermediate representation (IR) for the Low Level Virtual Machine (LLVM), which is itself targeted to run on specific architectures. Clang is also more extensible than GCC, allowing for a wide variety of tools and plugins to be used with it. It can be useful to be familiar with both GCC and clang compilers.

Basic Use#

To compile some C++ code with (for example) GCC you can do:

g++ my_app.cpp -o my_app

which will compile and link my_app which can be run as an executable. On most systems the compiler will automatically find standard library headers and link with the standard library as needed.

Often you will want to extend your application by linking to additional libraries. This is done by including the paths to library headers with the -I flag and linking with the -L and/or -lib_mylib flags.

You will often want your users to be able to easily build your application with their chosen compiler and build settings. It would be tedious for them to remember ideal settings and include and library paths each time. For this reason most C++ projects include this information when they are distributed, in a format understood by various build tools.

System Setup#

On MacOS, Apple ship their own version of clang by default. They also alias it to g++/gcc, meaning calling them actually runs clang. This version of clang can have integration issues with tooling and may be outdated relative to your needs. To use an up-to-date vanilla clang you can do:

brew install llvm

, add the following to your ~/.zshrc and refresh your shell:

export HOMEBREW_PREFIX="$(brew --prefix)" # find homebrew install location
export LLVM_PATH=$HOMEBREW_PREFIX/opt/llvm
export PATH=$LLVM_PATH/bin:$PATH # add LLVM bin to system path
export LDFLAGS="-L$LLVM_PATH/lib/c++ -Wl,-rpath,$LLVM_PATH/lib/c++":$LDFLAGS
export CPPFLAGS=-I$LLVM_PATH/include:$CPPFLAGS
export CC=$LLVM_PATH/bin/clang # Set the default compilers to clang
export CXX=$LLVM_PATH/bin/clang++

Build Tools#

A build tool is an application that takes some project configuration from a text file and a collection of user preferences and runs the compiler and linker (possibly multiple times) with suitable inputs to produce desired build outputs, such as an application executable or library.

make is an early and widely used build tool. Project and user configuration are specified in a makefile, with outputs defined as a collection of targets. Targets can depend on each other. Running the make command in a directory with a makefile will lead to the execution of any steps needed to create the targets in a suitable order. In the context of building programs, make’s primary purpose, targets are usually executables or shared or static libraries. make can also handle installation - running make install will install the built output into a pre-specified location. You can run make in parallel with the -j flag, e.g. -j 4 to build with four cores.

Different build tools with similar functionality to make exist, including ninja which can given better parallel performance and more flexible handling of different build configurations.

Even though a makefile can include project and user configuration the syntax can become quite verbose and build options can vary a lot across machines and platforms. As a result, another layer of tooling above them have been developed - which can be regarded as ‘generators’ of makefiles and similar. The most common are:

  • GNU Autotools - which will often involve running a configure script

  • CMake - which is launched with a cmake binary using a CMakeLists.txt input

  • meson

These tools generate a makefile or ninja specification based on project and user configurations, often in a more user-friendly way than ‘raw’ make, with make or ninja still driving the build underneath. When working with C++ projects you are likely to encounter cmake and autotools at least.

CMake#

CMake is a meta-build tool (generates output for use by other build tools) that is commonly used to build C++ projects. It is a cross-platform tool, which can also generate build configurations on Windows, Mac and mobile platfoms.

C++ projects that support cmake will have a CMakeLists.txt file near the top of their project-tree. It is typical to do an ‘out of source’ build with CMake, which involves creating a build directory outside of the project and then pointing back to the source directory containing the CMakeLists file:

mkdir build
cd build
cmake ../src

On Linux a CMake project configuration will typically support at least make, so after running the cmake binary you can build the project by running make in the build directory.

You will often end up running CMake to build projects from source, either due to not having admin priveleges or because you want to instrument or modify the code in some way. In these cases you don’t want to install the resulting make outputs into system locations. You can do this make creating an install directory:

mkdir install

and setting the CMake variable CMAKE_INSTALL_PREFIX to that directory. You can set CMake variables with when calling cmake:

cmake -DCMAKE_INSTALL_PREFIX=/path/to/install ../src

or using the visual ccmake tool - which opens a UI in the terminal (TUI).

GNU Autotools#

GNU Autotools or the GNU Build System are a set of tools for building portable C (and C++) applications. They consist of the autoconf, automake and libtool tools, and are used with make, pkg-config and the GCC.

A project based on GNU Autotools will either come with a configure script in its repo, or a configure.ac file. The latter can be used to generate a configure script by running automake on it.

Running the configure script will generate the project makefiles, which can be run with make. The configure script takes options, a useful one is --prefix which can be used to change the installation prefix for the make install command.

Package Managers#

Unlike PyPi and pip in Python, C++ doesn’t have established or centralized build dependency management. Recently CMake has added some dependency management functionality with its ExternalProject and FetchContent features. Another popular dependency management tool is conan - which has two distinct major versions ‘1’ and ‘2’. You can install conan on Mac with brew:

brew install conan

Conan 1 can be a bit slow to update its supported compiler versions. If you are using clang from brew as documented above you may need to modify your conan config in ~/.conan/settings.yml to add your compiler version to the clang section.