C++ linker presentation

This commit is contained in:
2026-02-24 10:21:15 +01:00
parent 0fda0d75fb
commit 6f3d98f388
32 changed files with 788 additions and 0 deletions

316
cpplinker/cpp_linkers.md Normal file
View File

@@ -0,0 +1,316 @@
# C++ Linkers: From Object Files to Executables
*Symbol Resolution · Static & Dynamic Linking · Name Mangling · Debug Techniques*
---
## 1. What Is a Linker?
The linker is the final step in the build pipeline. It combines multiple object files and libraries into a single executable by:
- **Resolving symbol references** — matching function/variable uses to their definitions across translation units
- **Relocating** — assigning final memory addresses to all symbols
- **Stripping or keeping** debug info, depending on build flags
```
Source Files (.cpp)
↓ [compiler]
Object Files (.o)
↓ [linker: ld / lld / link.exe]
Executable (ELF / Mach-O / PE)
```
---
## 2. The Compilation Pipeline
Each `.cpp` file is compiled independently into a relocatable object file.
```cpp
// foo.cpp — defines the symbol
int add(int a, int b) {
return a + b;
}
// main.cpp — references the symbol
extern int add(int, int); // declaration only
int main() {
int r = add(3, 4); // unresolved reference until link time
return r;
}
```
```bash
g++ -c foo.cpp -o foo.o # compile only, no link
g++ -c main.cpp -o main.o
g++ foo.o main.o -o app # link step
```
The four stages: **Preprocessing****Compilation****Assembly****Linking**
---
## 3. Symbol Resolution
The linker maintains a symbol table and matches every `UNDEF` reference to a `GLOBAL` definition.
```cpp
// math.cpp — defines the symbol
double square(double x) { return x * x; }
// main.cpp — references the symbol
double square(double); // extern declaration
int main() {
return (int)square(5.0); // unresolved until link
}
```
| Symbol | Type | Binding |
|--------|------|---------|
| `_ZN4math6squareEd` | FUNC | GLOBAL |
| `__gxx_personality_v0` | UNDEF | GLOBAL |
**Strong symbols** (definitions) must be unique. **Weak symbols** can be overridden. `UNDEF` means referenced but not yet defined.
```bash
nm -C -g math.o # list exported symbols (demangled)
nm -u main.o # show unresolved (UNDEF) symbols
objdump -t main.o # full symbol table dump
```
---
## 4. Name Mangling
C++ encodes namespaces, class names, and parameter types into symbol names so the linker can distinguish overloads.
```cpp
// C++ overloaded functions → different mangled names
int process(int x); // _Z7processi
int process(double x); // _Z7processd
int process(int x, double y); // _Z7processid
namespace Math {
double sqrt(double x); // _ZN4Math4sqrtEd
}
class Vector {
double dot(const Vector& v); // _ZN6Vector3dotERKS_
};
```
Each compiler uses its own ABI scheme (Itanium ABI on Linux/macOS, MSVC on Windows), so mixing compiler-built objects requires caution.
```bash
# Disable mangling for C interoperability
extern "C" {
int legacy_init(void); // symbol stays: legacy_init
void legacy_free(void*); // symbol stays: legacy_free
}
# Demangle a symbol manually
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
```
---
## 5. Static Linking
The linker copies the needed object files from `.a` archives directly into the executable.
```bash
# Build a static library
ar rcs libmymath.a vec.o mat.o quat.o
# Link statically — no runtime dependencies
g++ main.o -L. -lmymath -static -o app_static
# Verify: no shared lib deps
ldd app_static # → statically linked
```
**Pros:** single self-contained binary, no runtime dependency issues, faster startup.
**Cons:** larger binary, security patches require a full rebuild, code is duplicated across binaries.
> **Note:** Link order matters — list object files before libraries: `g++ main.o -lmymath`, not `g++ -lmymath main.o`.
---
## 6. Dynamic Linking
Shared libraries (`.so` / `.dll` / `.dylib`) are loaded at runtime by the dynamic linker (`ld.so`).
```bash
# Build a shared library (-fPIC is required)
g++ -fPIC -shared vec.o mat.o -o libmymath.so
# Link dynamically (default behavior)
g++ main.o -L. -lmymath -Wl,-rpath,'$ORIGIN' -o app
# Inspect runtime dependencies
ldd app
# libmymath.so => ./libmymath.so
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
# Override a library at runtime (useful for mocking)
LD_PRELOAD=./mock_net.so ./app
```
**PLT/GOT mechanism:** external calls go through the *Procedure Linkage Table* (PLT); the *Global Offset Table* (GOT) holds resolved addresses filled in lazily on first call. Use `-z now` or `BIND_NOW` to resolve all symbols at startup instead.
**Pros:** shared memory between processes, hot-patching by replacing `.so`, smaller binaries.
**Cons:** dependency management ("DLL hell"), slight startup overhead, harder single-file deployment.
---
## 7. Static vs. Dynamic: At a Glance
| Aspect | Static (`.a`) | Dynamic (`.so` / `.dll`) |
|---|---|---|
| Resolution | Link time | Load / runtime |
| Binary size | Larger (code embedded) | Smaller (references only) |
| Memory sharing | No — each process has its own copy | Yes — single copy in RAM |
| Deployment | One self-contained file | Must ship `.so` alongside |
| Hot patching | Full relink required | Replace `.so` and restart |
| Startup overhead | Minimal | Dynamic loader adds ~ms |
| Security updates | Manual rebuild | OS-level update propagates |
---
## 8. Linking with C Libraries
Use `extern "C"` to suppress name mangling when calling C code from C++ (or exposing C++ to C callers).
```cpp
// wrapper.h — expose C++ code to C callers
#pragma once
#ifdef __cplusplus
extern "C" { // disables mangling for these symbols
#endif
void vec_create(void** out);
void vec_destroy(void* vec);
void vec_push(void* vec, double val);
double vec_get(void* vec, int idx);
#ifdef __cplusplus
}
#endif
```
```cpp
// Calling a C library from C++
extern "C" int sqlite3_open(const char*, void**);
```
```bash
g++ main.cpp wrapper.cpp -lsqlite3 -o app
# -l<name> → links libname.so or libname.a
# -L<path> → add directory to library search path
# -Wl,--as-needed → skip libs that aren't actually used
```
**Common pitfalls:**
- Forgetting `extern "C"` → mangled name doesn't match the C header
- C struct padding may differ across compilers, breaking ABI
- C code cannot unwind C++ exceptions — use `noexcept` at boundaries
- Link order still matters: objects first, then libraries
---
## 9. Common Linker Errors
### `undefined reference to 'add(int, int)'`
**Cause:** definition is missing or the library wasn't linked.
**Fix:** add `-lmylib` or include the `.cpp` that defines it.
### `multiple definition of 'globalVar'`
**Cause:** variable defined (not just declared) in a header included by multiple TUs.
**Fix:** use `inline` (C++17), or `extern` declaration in the header + one definition in a `.cpp`.
### `cannot find -lmylib`
**Cause:** linker can't locate `libmylib.so` or `libmylib.a`.
**Fix:** add `-L/path/to/lib`, or set `LD_LIBRARY_PATH` / `PKG_CONFIG_PATH`.
---
## 10. Debugging Linker Issues
```bash
# Inspect symbols
nm -C -g lib.a # demangled, global symbols only
nm -u main.o # undefined (unresolved) symbols
objdump -d my.o # disassembly
readelf -s my.o # ELF symbol table
# Trace linker decisions
g++ main.o -lmylib -Wl,--verbose 2>&1 | grep "attempt"
ld --trace my.o # shows each file the linker considers
# Check shared lib deps
ldd ./app
chrpath -l ./app # show embedded RPATH
# Demangle a mangled symbol
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
```
**Useful flags:**
- `-Wl,--no-undefined` — catch unresolved symbols at build time
- `-Wl,--as-needed` — skip unused shared libraries
- `-Wl,--start-group ... --end-group` — resolve circular dependencies between archives
> *Rule of thumb: "When in doubt, `nm` it out."*
---
## 11. Bonus: Link-Time Optimization (LTO)
Without LTO, the compiler can only optimize within a single translation unit. With LTO, it embeds IR (Intermediate Representation) in `.o` files and performs whole-program optimization at link time.
```bash
# Enable LTO
g++ -flto -O2 -c foo.cpp -o foo.o
g++ -flto -O2 -c main.cpp -o main.o
g++ -flto -O2 foo.o main.o -o app_lto
# Thin LTO — faster, scales to large codebases (clang)
clang++ -flto=thin -O2 *.cpp -o app
```
LTO enables cross-TU inlining, dead code elimination, inter-procedural constant propagation, and whole-program devirtualization — typically 1025% speedup on real codebases.
**Gotcha:** all TUs must be compiled with `-flto`. Third-party archives compiled without it will still link, but that code won't be optimized across boundaries.
---
## Summary & Quick Reference
**Key concepts**
- Linker: resolves symbols, relocates addresses, produces the binary
- Symbol resolution order: strong > weak > UNDEF
- Name mangling encodes C++ type info into flat symbol names
- `extern "C"` disables mangling for C interoperability
**Essential commands**
| Command | Purpose |
|---|---|
| `nm -C -g lib.a` | List exported symbols (demangled) |
| `c++filt <sym>` | Demangle a symbol |
| `ldd app` | Show shared library dependencies |
| `objdump -d obj` | Disassemble object file |
| `ar rcs lib.a *.o` | Create a static library |
| `readelf -s obj` | ELF symbol table |
**Build flags**
| Flag | Effect |
|---|---|
| `-static` | Link everything statically |
| `-fPIC -shared` | Build a position-independent shared library |
| `-Wl,--no-undefined` | Fail at link time on unresolved symbols |
| `-Wl,-rpath,...` | Embed library search path in binary |
| `-flto` | Enable link-time optimization |
| `-Wl,--as-needed` | Only link libraries that are actually used |