How I Learned to Get By with C++ Packaging: A 5-Minute CMake Survival Guide

No matter the gamut of high-level languages available, sometimes one just has to dig into the stack. Usually that means C++. CMake is the de-facto build tool for C++. Here's a survival guide on it.

How I Learned to Get By with C++ Packaging: A 5-Minute CMake Survival Guide

I've been working on some programming challenges in C++ recently, and an important aspect of managing C++ projects is dependency management.

These days, we are spoiled with instant package managers:

  • npm in the Node.js/JavaScript ecosystem
  • cargo in Rust
  • pip in Python

In C++, while there are package managers such as Conan, if you're dealing with projects in the wild, you'll usually end up working with CMake.

Therefore, if you want to operate in the C++ ecosystem, it's not a matter of choice: you have to learn how to get by with CMake.

What Exactly Is CMake and Why Would One Care?

CMake is a cross-platform build system generator. The cross-platform aspect is crucial because CMake helps you abstract platform-specific differences to a certain extent.

For example, on Unix-based systems, CMake generates makefiles, which are then used to build a project. On Windows systems, CMake generates Visual Studio project files, which are subsequently used to build a project.

Note that different platforms usually have their own toolkits for compilation and debugging: Unix uses gcc, macOS uses clang, and so on.

Another important aspect in the C++ ecosystem is the ability to handle both executables and libraries.

There are many types of executables possible, based on:

  1. Target CPU architecture
  2. Target operating system
  3. Other factors

Additionally, for libraries, there are different ways of linking (linking refers to using functionality from another codebase in your code without necessarily knowing its implementation):

  1. Static Linking
  2. Dynamic Linking

I was working on some internal prototypes that needed to call underlying OS APIs to perform certain tasks. The only viable way to do this productively was by building on top of some C++ libraries.

How CMake Works: Three Stages In Any CMake Project

  1. Configuration: CMake reads all the CMakeLists.txt files and creates an intermediate structure that determines the next steps (such as listing source files, gathering libraries to link, etc.).
  2. Generation: Based on the intermediate output from the configuration stage, CMake generates platform-specific build files (such as makefiles on Unix, etc.).
  3. Build: The generated build artifacts are used with platform-specific tools, such as make or ninja, to create the executables or library files.

A Simple CMake-based Project (Hello World!)

Let's say, you have a C++ source file for finding roots of numbers.

tutorial.cxx

// A simple program that computes the square root of a number
#include <cmath>
#include <cstdlib> // TODO 5: Remove this line
#include <iostream>
#include <string>

// TODO 11: Include TutorialConfig.h

int main(int argc, char* argv[])
{
  if (argc < 2) {
    // TODO 12: Create a print statement using Tutorial_VERSION_MAJOR
    //          and Tutorial_VERSION_MINOR
    std::cout << "Usage: " << argv[0] << " number" << std::endl;
    return 1;
  }

  // convert input to double
  // TODO 4: Replace atof(argv[1]) with std::stod(argv[1])
  const double inputValue = atof(argv[1]);

  // calculate square root
  const double outputValue = sqrt(inputValue);
  std::cout << "The square root of " << inputValue << " is " << outputValue
            << std::endl;
  return 0;
}

CMakeLists.txt

project(Tutorial)
add_executable(tutorial tutorial.cxx)

The above two lines are the minimal number of directives/rules we need to provide, to get an executable.

We are also supposed to specify CMake minimum version number, but if we leave it out, some default will be assumed (let's skip that for now)

Technically, we don't need the project directive as well, but we will keep it as well.

So the most important line here is:

add_executable(tutorial tutorial.cxx)

We specify the target binary tutorial and the source tutorial.cxx.

How to build

I'll specify a set of commands, which can be used to build the project and to test the binary. The explanation will come later.

mkdir build
cd build/
cmake ..
ls -l # inspect generated build files
cmake --build .
./tutorial 10 # test the binary

You can see that the overall build steps involves 5-6 steps as listed above.

Firstly, in CMake we're supposed to keep the build-related stuff separate from source. So we create a build directory first:

mkdir build

Then we can do all build related activities from within the build folder:

cd build

From this point onwards, we execute multiple build related tasks:

We generate the configuration files.

cmake ..

In this step, CMake will generate platform-specific configuration files. In my case, in ubuntu, I see makefiles generated. These are quite lengthy, etc, but I don't need to worry about it now.

Next, I trigger a build, based on the newly generated files:

cmake --build .

This steps uses the build files to generates the desired binary file tutorial

I can verify the binary functioning as expected with:

./tutorial 16

I get the expected answer 4, looks like the build is working as expected!

FeedZap: Read 2X Books This Year

FeedZap helps you consume your books through a healthy, snackable feed, so that you can read more with less time, effort and energy.

Injecting Variables Into Your C++ Project

CMake has a mechanism via Config.h.in where you specify variables in your CMakeLists.txt which then become available to your cpp files.

Here is an example, where we define project versions in CMakeLists.txt that get used in the program.

Config.h.in

In this file, variables arriving from CMakeLists.txt will show up as @VAR_NAME@.

#pragma once

#define PROJECT_VERSION_MAJOR @PROJECT_VERSION_MAJOR@
#define PROJECT_VERSION_MINOR @PROJECT_VERSION_MINOR@
#define AUTHOR_NAME "@AUTHOR_NAME@"

CMakeLists.txt

cmake_minimum_required(VERSION 3.10)
project(Tutorial)

# Define configuration variables
set(PROJECT_VERSION_MAJOR 1)
set(PROJECT_VERSION_MINOR 0)
set(AUTHOR_NAME "Jith")

# Configure the header file
configure_file(Config.h.in Config.h)

# Add the executable
add_executable(tutorial tutorial.cxx)

# Include the directory where the generated header file is located
target_include_directories(tutorial PRIVATE "${CMAKE_BINARY_DIR}")

Note that we've added cmake_minimum_required to specify minimum CMake version required. This is a good practice in writing these files.

Then we have many set() statements for defining any variable names needed.

We specify the configuration file, where we get to use the variables set above.

Finally, CMake generates the header files after the variable placeholders are filled in. These dynamic headers must be included.

In our case, the Config.h file will be found in the ${CMAKE_BINARY_DIR}, so we just specify that.

A curiosity in the following line might be the PRIVATE label:

target_include_directories(tutorial PRIVATE "${CMAKE_BINARY_DIR}")

Two Key Ideas You Must Understand To Get CMake Going: Visibility Specifiers & Targets

There are three visibility specifiers: PRIVATE, PUBLIC, INTERFACE

Visibility specifiers can be used in commands like: target_include_directories, target_link_libraries, etc

These are specified in the context of Targets. A target in CMake is an abstraction representation of some sort of output:

  1. Executable target (via add_executable) generates binaries
  2. Library target (via add_library) generates library files
  3. Custom target (via add_custom_target) generates arbitrary files via scripts, etc

All the above produce concrete files or artifacts as an output. A special case of the Library target is the Interface Target. So, an interface target is specified like this:

add_library(my_interface_lib INTERFACE)
target_include_directories(my_interface_lib INTERFACE include/)

Here, my_interface_lib doesn't produce any file immediately. But at later stage, some concrete target can depend on my_interface_lib. That means automatically, the include targets specified by it are also depended upon. So essentially this INTERFACE library is sort of a convenience mechanism for building up a dependency tree of sources, etc.

So, after understanding both the concepts of targets and dependency, we can go back to the idea of visilbity specifiers.

PRIVATE visibility looks like this:

target_include_directories(tutorial PRIVATE "${CMAKE_BINARY_DIR}")

PRIVATE means, the target tutorial will use the specified include directory. But say, at a later stage, we link something else against tutorial, the include directories will not propagate

PUBLIC visibility looks like this:

target_include_directories(tutorial PUBLIC "${CMAKE_BINARY_DIR}")

With the PUBLIC specifier, we mean that the tutorial target will require the given include directory, and in addition to that, any other targets which in turn may depend on tutorial will also get it propagated.

INTERFACE visibility looks like this:

target_include_directories(tutorial INTERFACE "${CMAKE_BINARY_DIR}")

With the INTERFACE specifier, we mean that tutorial target will not require the given include directory, but anything that may depend on tutorial will get the include files propagated to them.

So, in summary, this is how the visibility specifiers work:

  1. PRIVATE - source propagates only to target
  2. PUBLIC - source propagates to both target & dependent targets
  3. INTERFACE - source doesn't propagate to target, but does propagate to dependent targets

Dividing the Project Build into Libraries and Directories

As projects grow, we usually need modules to organize the project & to manage complexity.

In CMake we can have sub-directories, where we can specify independent modules and their own custom build processes.

One can have a main CMake configuration, which can trigger many library (sub-directory) builds and finally link all the modules together.

This is a slightly simplified/modified example from the original. We are going to create a module/library called MathFunctions, which will be built out as a static library (MathFunctions.a in unix). And finally - we will link to our main program.

I will present the source files first (fairly straightforward)

MathFunctions.h

#pragma once

namespace mathfunctions {
double sqrt(double x);
}

MathFunctions.cxx

#include "MathFunctions.h"
#include "mysqrt.h"

namespace mathfunctions {
double sqrt(double x)
{
  return detail::mysqrt(x);
}
}

mysqrt.h

#pragma once

namespace mathfunctions {
namespace detail {
double mysqrt(double x);
}
}

mysqrt.cxx

#include "mysqrt.h"

#include <iostream>

namespace mathfunctions {
namespace detail {
// a hack square root calculation using simple operations
double mysqrt(double x)
{
  if (x <= 0) {
    return 0;
  }

  double result = x;

  // do ten iterations
  for (int i = 0; i < 10; ++i) {
    if (result <= 0) {
      result = 0.1;
    }
    double delta = x - (result * result);
    result = result + 0.5 * delta / result;
    std::cout << "Computing sqrt of " << x << " to be " << result << std::endl;
  }
  return result;
}
}
}

To summarize these pieces of code, we introduce the following:

  1. A namespace called mathfunctions. A namespace is a way to group related code together under a common name. Think of it as a container for functions, variables, and other elements. This will serve as the "public API" for the main consumer (which we'll see later).

  2. Inside this namespace, we define our custom implementation of the sqrt (square root) function. This allows us to create our version of the sqrt function without conflicting with any other versions of sqrt that might exist elsewhere in the program.

Now, how do we build this folder out as a unix binary? We have a custom CMake sub-configuration for that:

MathFunctions/CMakeLists.txt

add_library(MathFunctions MathFunctions.cxx mysqrt.cxx)

Essentially - we build out the library with a single line add_library and sepcifying the relevant .cxx files.

But we're not done yet, the meat of the solution is in how we link this sub-directory or library to our main project:

tutorial.cxx (using the library/module version)

#include "Config.h"
#include "MathFunctions.h"
#include <cmath>
#include <cstdlib> 
#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
  std::cout << "Project Version: " << PROJECT_VERSION_MAJOR << "." << PROJECT_VERSION_MINOR << std::endl;
  std::cout << "Author: " << AUTHOR_NAME << std::endl;

  if (argc < 2) {
    std::cout << "Usage: " << argv[0] << " number" << std::endl;
    return 1;
  }

  const double inputValue = atof(argv[1]);

  // use library function
  const double outputValue = mathfunctions::sqrt(inputValue);
  std::cout << "The square root of " << inputValue << " is " << outputValue
            << std::endl;
  return 0;
}

We import MathFunctions.h at the very top, and then, use the newly available namespace mathfunctions to invoke the sqrt method.

As we know MathFunctions.h is in a sub-directory, but we refer to it directly as though the file is in the root directory, how is it possible?

Well - the answer will be in the revised main CMake configuration file:

CMakeLists.txt

cmake_minimum_required(VERSION 3.10)
project(Tutorial)

# Define configuration variables
set(PROJECT_VERSION_MAJOR 1)
set(PROJECT_VERSION_MINOR 0)
set(AUTHOR_NAME "Jith")

# Configure the header file
configure_file(Config.h.in Config.h)

add_subdirectory(MathFunctions)

add_executable(tutorial tutorial.cxx)


target_include_directories(tutorial PUBLIC "${PROJECT_BINARY_DIR}" "${PROJECT_SOURCE_DIR}/MathFunctions")

target_link_libraries(tutorial PUBLIC MathFunctions)

There are a few new lines here:

  1. add_subdirectory mentions that there's a child or sub-directory build which CMake has to take care of
  2. target_include_directories specifies the MathFunctions folder. This way, our earlier question is answered - our tutorial.cxx can refer to MathFunctions.h directly due to this CMake magic
  3. Finally in target_link_libraries we link the MathFunctions library to the main target tutorial

When I build this out in linux, I see a new artifact in /build/MathFunctions/libMathFunctions.a. Essentially, this is a library object file, which is to be static linked (made part of the final binary).

I also have the build artifact tutorial, which already includes this library. This means that, for example, I can move tutorial binary anywhere I want, and run it as I usually would and it will work. The libMathFunctions.a object file is already part of the main binary file.

There are tricks to simplify the include mechanisms etc to make the configurations more modular, but I won't go into those details here. You can look at the Resources section of this article for more authoritative sources to get those details.

1 powerful reason a day nudging you to read
so that you can read more, and level up in life.

Sent throughout the year. Absolutely FREE.

What Next?

It was fun learning a bit about how CMake works, and how to get things basic things done with it. It addresses most of the problems I have with C++ packaging as of now. But I'm also interested exploring Conan and vcpkg to simplify dependency management in C++. I'll be looking out to opportunities to do that in the future. Have you worked with CMake/C++ projects? Do share your thoughts package management in the comments below.

Resources

CMake Official Tutorial

CMake has a friendly tutorial to get started with the tool.

To get the tutorial source files (each step of the tutorial), clone the repo first:

 git clone git@github.com:Kitware/CMake.git --depth=1
 cd CMake/Help/guide/tutorial/

Do an ls to see a full listing of stage by stage code for the CMake tutorial.

CMake Book

Mastering CMake is a freely-available book-length exposition on how to use CMake. Do check it out.