Compiling and linking

Compiling and linking C programs is a little weird at first. Here are some tips:

Useful flags

I find it useful to always set the following flags when using gcc:

-Wall -Werror

It is pretty strict but makes for warning-free compliation. I begin all my Makefile's like this:

CC := gcc
CFLAGS := -Wall -Werror

all: my-prog

debug: CFLAGS += -ggdb
debug: my-prog

my-prog: my_prog.c
    $(CC) $(CFLAGS) -o my-prog my_prog.c

Then I can make to build the program, or make debug to have debugging symbols compiled into the program, for later gdb my-prog.

Simple program with no dependencies

This will compile the source my_prog.c into a binary program my-prog.

gcc -Wall -Werror -o my-prog my_prog.c

It will actually generate an intermediate "object file" (probably in /tmp) which is compiled binary of instructions. The linker is run by gcc behind the scenes after this to generate the my-prog ELF executable.

Program with components compiled as objects

You can pick parts of your code and group them up into components - groups of functions, structure and type definitions that make sense together. You could simply include the raw source files as part of the compilation:

gcc -Wall -Werror -o my-prog time_functions.c utils.c my_prog.c

However you can also compile each component separately into an object file:

gcc -Wall -Werror -c -o time_functions.o time_functions.c
gcc -Wall -Werror -c -o utils.o utils.c

Now you can compile the program together with these new objects:

gcc -Wall -Werror -o my-prog my_prog.c time_functions.o utils.o

However there is a difference. When the source files are combined in one compilation then end up being compiled into a single object file. This compiler can "see" all the function and type definitions from time_functions.c and utils.c when generating my-prog.

When compiling using object files these are passed to the linker - not the compiler. So when the compiler sees a call to a util function in my_prog.c it doesn't know the function signature because it can't see the source code in utils.c - these were stripped when utils.o was compiled.

This is the purpose of header files. You make a time_functions.h and utils.h and you #include them into your my_prog.c. These headers contain "forward declarations" of function signatures and type definitions. In order for the compiler to know where to find these headers you need to add -I. to your gcc flags:

gcc -Wall -Werror -I. -o my-prog my_prog.c time_functions.o utils.o

Now when gcc encounters #include <time_functions.h> it knows to look for that in the current directory, .. It now knows the forward declarations for functions that have been compiled already into time_functions.o.

Note: sometimes people distribute code as headers (called Header-only libraries). These contain not only forward declarations of functions, but the actual body of the function also. This means there are no object files to compile and link, and greatly simplifies distribution of free software libraries.

Libraries

Often (hopefully) you want to use someone else's software libraries. Shared libraries are pre-complied (and thus architecture specific) chunks of code. They come in two flavors: 1) static libraries, 2) shared object libraries. With static libraries they are linked into the main program like object files. With shared object libraries they are linked to the program at run time. This means two different programs can use the same library without having a copy of all that code included in the compiled executable.

Shared libraies

When you build a program that uses a shared library you need to tell it the library to link to:

gcc gcc -Wall -Werror -lGL -o my-prog my_prog.c

The linker takes -lGL and strips the -l from the front and replaces it with lib. It then appends .so resulting in libGL.so. It will then look for this file in the standard library locations in your system:

/usr/lib/x86_64-linux-gnu/libGL.so

Interestingly this is actually a symbolic link and lives alongside other symbolic links plus the real file (in this case libGL.so.1.2.0):

libGL.so -> libGL.so.1.2.0
libGL.so.1 -> libGL.so.1.2.0
libGL.so.1.2.0

These links are managed by a program called ldconfig. This program has a specific list of file system locations where libraries are installed - see files in /etc/ld.so.conf.d/. Sometimes installers put shared libraries outside one of these directories, and the admin can either add this path to one of these config files or shift the file.

When compiling an executable that uses a shared library you also need to #include a header into your source so the forward declarations are known by the compiler, as these are not included in the shared library itself. Interestingly the header name and library name can be quite different. For example -lm links to libm.so but you #include <math.h> for its declarations. Even more interesting is the contents of libm.so:

$ file /usr/lib/x86_64-linux-gnu/libm.so
/usr/lib/x86_64-linux-gnu/libm.so: ASCII text

What, "ASCII text"? This is actually a "linker script" which I don't know anything about (although reading that file it's self explanitory to some extent).

Static libraries

Like shared libraries, static libraries are linked using a flag to the compiler:

gcc gcc -Wall -Werror -lfoo -o my-prog my_prog.c

If there is only a libfoo.a in existance the object code in that library will be copied into the executable. This means no linking at run-time but a larger executable. My Linux system contains no static libraries at all, indicating distributions (or Debian at least) are trying to move away from them. Golang however has embraced static linking where it's common to have basic executables megabytes in size. Static linking does help with distributing software to some extent. Perhaps Golang was designed in the era of containers where you are ideally only running a single executable in an environment where the benefits of shared libraries makes less sense. Just a guess.

ar -crs libawesome.a time_functions.o utils.o