# C++ Linkers: From Object Files to Executables *Symbol Resolution · Static & Dynamic Linking · Name Mangling · Debug Techniques* --- ## 1. What Is a Linker? The linker is the final step in the build pipeline. It combines multiple object files and libraries into a single executable by: - **Resolving symbol references** — matching function/variable uses to their definitions across translation units - **Relocating** — assigning final memory addresses to all symbols - **Stripping or keeping** debug info, depending on build flags ``` Source Files (.cpp) ↓ [compiler] Object Files (.o) ↓ [linker: ld / lld / link.exe] Executable (ELF / Mach-O / PE) ``` --- ## 2. The Compilation Pipeline Each `.cpp` file is compiled independently into a relocatable object file. ```cpp // foo.cpp — defines the symbol int add(int a, int b) { return a + b; } // main.cpp — references the symbol extern int add(int, int); // declaration only int main() { int r = add(3, 4); // unresolved reference until link time return r; } ``` ```bash g++ -c foo.cpp -o foo.o # compile only, no link g++ -c main.cpp -o main.o g++ foo.o main.o -o app # link step ``` The four stages: **Preprocessing** → **Compilation** → **Assembly** → **Linking** --- ## 3. Symbol Resolution The linker maintains a symbol table and matches every `UNDEF` reference to a `GLOBAL` definition. ```cpp // math.cpp — defines the symbol double square(double x) { return x * x; } // main.cpp — references the symbol double square(double); // extern declaration int main() { return (int)square(5.0); // unresolved until link } ``` | Symbol | Type | Binding | |--------|------|---------| | `_ZN4math6squareEd` | FUNC | GLOBAL | | `__gxx_personality_v0` | UNDEF | GLOBAL | **Strong symbols** (definitions) must be unique. **Weak symbols** can be overridden. `UNDEF` means referenced but not yet defined. ```bash nm -C -g math.o # list exported symbols (demangled) nm -u main.o # show unresolved (UNDEF) symbols objdump -t main.o # full symbol table dump ``` --- ## 4. Name Mangling C++ encodes namespaces, class names, and parameter types into symbol names so the linker can distinguish overloads. ```cpp // C++ overloaded functions → different mangled names int process(int x); // _Z7processi int process(double x); // _Z7processd int process(int x, double y); // _Z7processid namespace Math { double sqrt(double x); // _ZN4Math4sqrtEd } class Vector { double dot(const Vector& v); // _ZN6Vector3dotERKS_ }; ``` Each compiler uses its own ABI scheme (Itanium ABI on Linux/macOS, MSVC on Windows), so mixing compiler-built objects requires caution. ```bash # Disable mangling for C interoperability extern "C" { int legacy_init(void); // symbol stays: legacy_init void legacy_free(void*); // symbol stays: legacy_free } # Demangle a symbol manually c++filt _ZN4Math4sqrtEd # → Math::sqrt(double) ``` --- ## 5. Static Linking The linker copies the needed object files from `.a` archives directly into the executable. ```bash # Build a static library ar rcs libmymath.a vec.o mat.o quat.o # Link statically — no runtime dependencies g++ main.o -L. -lmymath -static -o app_static # Verify: no shared lib deps ldd app_static # → statically linked ``` **Pros:** single self-contained binary, no runtime dependency issues, faster startup. **Cons:** larger binary, security patches require a full rebuild, code is duplicated across binaries. > **Note:** Link order matters — list object files before libraries: `g++ main.o -lmymath`, not `g++ -lmymath main.o`. --- ## 6. Dynamic Linking Shared libraries (`.so` / `.dll` / `.dylib`) are loaded at runtime by the dynamic linker (`ld.so`). ```bash # Build a shared library (-fPIC is required) g++ -fPIC -shared vec.o mat.o -o libmymath.so # Link dynamically (default behavior) g++ main.o -L. -lmymath -Wl,-rpath,'$ORIGIN' -o app # Inspect runtime dependencies ldd app # libmymath.so => ./libmymath.so # libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 # Override a library at runtime (useful for mocking) LD_PRELOAD=./mock_net.so ./app ``` **PLT/GOT mechanism:** external calls go through the *Procedure Linkage Table* (PLT); the *Global Offset Table* (GOT) holds resolved addresses filled in lazily on first call. Use `-z now` or `BIND_NOW` to resolve all symbols at startup instead. **Pros:** shared memory between processes, hot-patching by replacing `.so`, smaller binaries. **Cons:** dependency management ("DLL hell"), slight startup overhead, harder single-file deployment. --- ## 7. Static vs. Dynamic: At a Glance | Aspect | Static (`.a`) | Dynamic (`.so` / `.dll`) | |---|---|---| | Resolution | Link time | Load / runtime | | Binary size | Larger (code embedded) | Smaller (references only) | | Memory sharing | No — each process has its own copy | Yes — single copy in RAM | | Deployment | One self-contained file | Must ship `.so` alongside | | Hot patching | Full relink required | Replace `.so` and restart | | Startup overhead | Minimal | Dynamic loader adds ~ms | | Security updates | Manual rebuild | OS-level update propagates | --- ## 8. Linking with C Libraries Use `extern "C"` to suppress name mangling when calling C code from C++ (or exposing C++ to C callers). ```cpp // wrapper.h — expose C++ code to C callers #pragma once #ifdef __cplusplus extern "C" { // disables mangling for these symbols #endif void vec_create(void** out); void vec_destroy(void* vec); void vec_push(void* vec, double val); double vec_get(void* vec, int idx); #ifdef __cplusplus } #endif ``` ```cpp // Calling a C library from C++ extern "C" int sqlite3_open(const char*, void**); ``` ```bash g++ main.cpp wrapper.cpp -lsqlite3 -o app # -l → links libname.so or libname.a # -L → add directory to library search path # -Wl,--as-needed → skip libs that aren't actually used ``` **Common pitfalls:** - Forgetting `extern "C"` → mangled name doesn't match the C header - C struct padding may differ across compilers, breaking ABI - C code cannot unwind C++ exceptions — use `noexcept` at boundaries - Link order still matters: objects first, then libraries --- ## 9. Common Linker Errors ### `undefined reference to 'add(int, int)'` **Cause:** definition is missing or the library wasn't linked. **Fix:** add `-lmylib` or include the `.cpp` that defines it. ### `multiple definition of 'globalVar'` **Cause:** variable defined (not just declared) in a header included by multiple TUs. **Fix:** use `inline` (C++17), or `extern` declaration in the header + one definition in a `.cpp`. ### `cannot find -lmylib` **Cause:** linker can't locate `libmylib.so` or `libmylib.a`. **Fix:** add `-L/path/to/lib`, or set `LD_LIBRARY_PATH` / `PKG_CONFIG_PATH`. --- ## 10. Debugging Linker Issues ```bash # Inspect symbols nm -C -g libmath.a # demangled, global symbols only nm -u main.o # undefined (unresolved) symbols objdump -d math_lib.o # disassembly readelf -s math_lib.o # ELF symbol table # Trace linker decisions g++ main.o -L. -lmath -Wl,--verbose 2>&1 | grep "attempt" ld --trace math_lib.o # shows each file the linker considers # Check shared lib deps ldd ./app chrpath -l ./app # show embedded RPATH # fallback if chrpath is not installed: readelf -d ./app | grep -E 'RPATH|RUNPATH' # Demangle a mangled symbol c++filt _ZN4Math4sqrtEd # → Math::sqrt(double) # Pipe nm output through c++filt to demangle all symbols at once nm -g libmath.a | grep " T " | awk '{print $NF}' | c++filt ``` **Useful flags:** - `-Wl,--no-undefined` — catch unresolved symbols at build time - `-Wl,--as-needed` — skip unused shared libraries - `-Wl,--start-group ... --end-group` — resolve circular dependencies between archives > *Rule of thumb: "When in doubt, `nm` it out."* --- ## 11. Bonus: Link-Time Optimization (LTO) Without LTO, the compiler can only optimize within a single translation unit. With LTO, it embeds IR (Intermediate Representation) in `.o` files and performs whole-program optimization at link time. ```bash # Enable LTO g++ -flto -O2 -c foo.cpp -o foo.o g++ -flto -O2 -c main.cpp -o main.o g++ -flto -O2 foo.o main.o -o app_lto # Thin LTO — faster, scales to large codebases (clang) clang++ -flto=thin -O2 *.cpp -o app ``` LTO enables cross-TU inlining, dead code elimination, inter-procedural constant propagation, and whole-program devirtualization — typically 10–25% speedup on real codebases. **Gotcha:** all TUs must be compiled with `-flto`. Third-party archives compiled without it will still link, but that code won't be optimized across boundaries. --- ## Summary & Quick Reference **Key concepts** - Linker: resolves symbols, relocates addresses, produces the binary - Symbol resolution order: strong > weak > UNDEF - Name mangling encodes C++ type info into flat symbol names - `extern "C"` disables mangling for C interoperability **Essential commands** | Command | Purpose | |---|---| | `nm -C -g lib.a` | List exported symbols (demangled) | | `c++filt ` | Demangle a symbol | | `ldd app` | Show shared library dependencies | | `objdump -d obj` | Disassemble object file | | `ar rcs lib.a *.o` | Create a static library | | `readelf -s obj` | ELF symbol table | **Build flags** | Flag | Effect | |---|---| | `-static` | Link everything statically | | `-fPIC -shared` | Build a position-independent shared library | | `-Wl,--no-undefined` | Fail at link time on unresolved symbols | | `-Wl,-rpath,...` | Embed library search path in binary | | `-flto` | Enable link-time optimization | | `-Wl,--as-needed` | Only link libraries that are actually used |