Files
strangecpp/cpplinker/cpp_linkers.md

9.7 KiB
Raw Permalink Blame History

C++ Linkers: From Object Files to Executables

Symbol Resolution · Static & Dynamic Linking · Name Mangling · Debug Techniques


1. What Is a Linker?

The linker is the final step in the build pipeline. It combines multiple object files and libraries into a single executable by:

  • Resolving symbol references — matching function/variable uses to their definitions across translation units
  • Relocating — assigning final memory addresses to all symbols
  • Stripping or keeping debug info, depending on build flags
Source Files (.cpp)
      ↓  [compiler]
Object Files (.o)
      ↓  [linker: ld / lld / link.exe]
Executable (ELF / Mach-O / PE)

2. The Compilation Pipeline

Each .cpp file is compiled independently into a relocatable object file.

// foo.cpp — defines the symbol
int add(int a, int b) {
    return a + b;
}

// main.cpp — references the symbol
extern int add(int, int);   // declaration only
int main() {
    int r = add(3, 4);      // unresolved reference until link time
    return r;
}
g++ -c foo.cpp  -o foo.o    # compile only, no link
g++ -c main.cpp -o main.o
g++ foo.o main.o -o app     # link step

The four stages: PreprocessingCompilationAssemblyLinking


3. Symbol Resolution

The linker maintains a symbol table and matches every UNDEF reference to a GLOBAL definition.

// math.cpp — defines the symbol
double square(double x) { return x * x; }

// main.cpp — references the symbol
double square(double);        // extern declaration
int main() {
    return (int)square(5.0);  // unresolved until link
}
Symbol Type Binding
_ZN4math6squareEd FUNC GLOBAL
__gxx_personality_v0 UNDEF GLOBAL

Strong symbols (definitions) must be unique. Weak symbols can be overridden. UNDEF means referenced but not yet defined.

nm -C -g math.o     # list exported symbols (demangled)
nm -u main.o        # show unresolved (UNDEF) symbols
objdump -t main.o   # full symbol table dump

4. Name Mangling

C++ encodes namespaces, class names, and parameter types into symbol names so the linker can distinguish overloads.

// C++ overloaded functions → different mangled names
int  process(int x);           // _Z7processi
int  process(double x);        // _Z7processd
int  process(int x, double y); // _Z7processid

namespace Math {
    double sqrt(double x);     // _ZN4Math4sqrtEd
}

class Vector {
    double dot(const Vector& v); // _ZN6Vector3dotERKS_
};

Each compiler uses its own ABI scheme (Itanium ABI on Linux/macOS, MSVC on Windows), so mixing compiler-built objects requires caution.

# Disable mangling for C interoperability
extern "C" {
    int  legacy_init(void);    // symbol stays: legacy_init
    void legacy_free(void*);   // symbol stays: legacy_free
}

# Demangle a symbol manually
c++filt _ZN4Math4sqrtEd        # → Math::sqrt(double)

5. Static Linking

The linker copies the needed object files from .a archives directly into the executable.

# Build a static library
ar rcs libmymath.a  vec.o mat.o quat.o

# Link statically — no runtime dependencies
g++ main.o -L. -lmymath -static -o app_static

# Verify: no shared lib deps
ldd app_static    # → statically linked

Pros: single self-contained binary, no runtime dependency issues, faster startup.
Cons: larger binary, security patches require a full rebuild, code is duplicated across binaries.

Note: Link order matters — list object files before libraries: g++ main.o -lmymath, not g++ -lmymath main.o.


6. Dynamic Linking

Shared libraries (.so / .dll / .dylib) are loaded at runtime by the dynamic linker (ld.so).

# Build a shared library (-fPIC is required)
g++ -fPIC -shared vec.o mat.o -o libmymath.so

# Link dynamically (default behavior)
g++ main.o -L. -lmymath -Wl,-rpath,'$ORIGIN' -o app

# Inspect runtime dependencies
ldd app
# libmymath.so => ./libmymath.so
# libc.so.6    => /lib/x86_64-linux-gnu/libc.so.6

# Override a library at runtime (useful for mocking)
LD_PRELOAD=./mock_net.so ./app

PLT/GOT mechanism: external calls go through the Procedure Linkage Table (PLT); the Global Offset Table (GOT) holds resolved addresses filled in lazily on first call. Use -z now or BIND_NOW to resolve all symbols at startup instead.

Pros: shared memory between processes, hot-patching by replacing .so, smaller binaries.
Cons: dependency management ("DLL hell"), slight startup overhead, harder single-file deployment.


7. Static vs. Dynamic: At a Glance

Aspect Static (.a) Dynamic (.so / .dll)
Resolution Link time Load / runtime
Binary size Larger (code embedded) Smaller (references only)
Memory sharing No — each process has its own copy Yes — single copy in RAM
Deployment One self-contained file Must ship .so alongside
Hot patching Full relink required Replace .so and restart
Startup overhead Minimal Dynamic loader adds ~ms
Security updates Manual rebuild OS-level update propagates

8. Linking with C Libraries

Use extern "C" to suppress name mangling when calling C code from C++ (or exposing C++ to C callers).

// wrapper.h — expose C++ code to C callers
#pragma once
#ifdef __cplusplus
extern "C" {           // disables mangling for these symbols
#endif
    void   vec_create(void** out);
    void   vec_destroy(void* vec);
    void   vec_push(void* vec, double val);
    double vec_get(void* vec, int idx);
#ifdef __cplusplus
}
#endif
// Calling a C library from C++
extern "C" int sqlite3_open(const char*, void**);
g++ main.cpp wrapper.cpp -lsqlite3 -o app
# -l<name>   → links libname.so or libname.a
# -L<path>   → add directory to library search path
# -Wl,--as-needed  → skip libs that aren't actually used

Common pitfalls:

  • Forgetting extern "C" → mangled name doesn't match the C header
  • C struct padding may differ across compilers, breaking ABI
  • C code cannot unwind C++ exceptions — use noexcept at boundaries
  • Link order still matters: objects first, then libraries

9. Common Linker Errors

undefined reference to 'add(int, int)'

Cause: definition is missing or the library wasn't linked.
Fix: add -lmylib or include the .cpp that defines it.

multiple definition of 'globalVar'

Cause: variable defined (not just declared) in a header included by multiple TUs.
Fix: use inline (C++17), or extern declaration in the header + one definition in a .cpp.

cannot find -lmylib

Cause: linker can't locate libmylib.so or libmylib.a.
Fix: add -L/path/to/lib, or set LD_LIBRARY_PATH / PKG_CONFIG_PATH.


10. Debugging Linker Issues

# Inspect symbols
nm -C -g libmath.a                        # demangled, global symbols only
nm -u main.o                              # undefined (unresolved) symbols
objdump -d math_lib.o                     # disassembly
readelf -s math_lib.o                     # ELF symbol table

# Trace linker decisions
g++ main.o -L. -lmath -Wl,--verbose 2>&1 | grep "attempt"
ld --trace math_lib.o                     # shows each file the linker considers

# Check shared lib deps
ldd ./app
chrpath -l ./app                          # show embedded RPATH
# fallback if chrpath is not installed:
readelf -d ./app | grep -E 'RPATH|RUNPATH'

# Demangle a mangled symbol
c++filt _ZN4Math4sqrtEd                   # → Math::sqrt(double)

# Pipe nm output through c++filt to demangle all symbols at once
nm -g libmath.a | grep " T " | awk '{print $NF}' | c++filt

Useful flags:

  • -Wl,--no-undefined — catch unresolved symbols at build time
  • -Wl,--as-needed — skip unused shared libraries
  • -Wl,--start-group ... --end-group — resolve circular dependencies between archives

Rule of thumb: "When in doubt, nm it out."


Without LTO, the compiler can only optimize within a single translation unit. With LTO, it embeds IR (Intermediate Representation) in .o files and performs whole-program optimization at link time.

# Enable LTO
g++ -flto -O2 -c foo.cpp  -o foo.o
g++ -flto -O2 -c main.cpp -o main.o
g++ -flto -O2 foo.o main.o -o app_lto

# Thin LTO — faster, scales to large codebases (clang)
clang++ -flto=thin -O2 *.cpp -o app

LTO enables cross-TU inlining, dead code elimination, inter-procedural constant propagation, and whole-program devirtualization — typically 1025% speedup on real codebases.

Gotcha: all TUs must be compiled with -flto. Third-party archives compiled without it will still link, but that code won't be optimized across boundaries.


Summary & Quick Reference

Key concepts

  • Linker: resolves symbols, relocates addresses, produces the binary
  • Symbol resolution order: strong > weak > UNDEF
  • Name mangling encodes C++ type info into flat symbol names
  • extern "C" disables mangling for C interoperability

Essential commands

Command Purpose
nm -C -g lib.a List exported symbols (demangled)
c++filt <sym> Demangle a symbol
ldd app Show shared library dependencies
objdump -d obj Disassemble object file
ar rcs lib.a *.o Create a static library
readelf -s obj ELF symbol table

Build flags

Flag Effect
-static Link everything statically
-fPIC -shared Build a position-independent shared library
-Wl,--no-undefined Fail at link time on unresolved symbols
-Wl,-rpath,... Embed library search path in binary
-flto Enable link-time optimization
-Wl,--as-needed Only link libraries that are actually used