Files
strangecpp/cpplinker/cpp_linkers.md

322 lines
9.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# C++ Linkers: From Object Files to Executables
*Symbol Resolution · Static & Dynamic Linking · Name Mangling · Debug Techniques*
---
## 1. What Is a Linker?
The linker is the final step in the build pipeline. It combines multiple object files and libraries into a single executable by:
- **Resolving symbol references** — matching function/variable uses to their definitions across translation units
- **Relocating** — assigning final memory addresses to all symbols
- **Stripping or keeping** debug info, depending on build flags
```
Source Files (.cpp)
↓ [compiler]
Object Files (.o)
↓ [linker: ld / lld / link.exe]
Executable (ELF / Mach-O / PE)
```
---
## 2. The Compilation Pipeline
Each `.cpp` file is compiled independently into a relocatable object file.
```cpp
// foo.cpp — defines the symbol
int add(int a, int b) {
return a + b;
}
// main.cpp — references the symbol
extern int add(int, int); // declaration only
int main() {
int r = add(3, 4); // unresolved reference until link time
return r;
}
```
```bash
g++ -c foo.cpp -o foo.o # compile only, no link
g++ -c main.cpp -o main.o
g++ foo.o main.o -o app # link step
```
The four stages: **Preprocessing****Compilation****Assembly****Linking**
---
## 3. Symbol Resolution
The linker maintains a symbol table and matches every `UNDEF` reference to a `GLOBAL` definition.
```cpp
// math.cpp — defines the symbol
double square(double x) { return x * x; }
// main.cpp — references the symbol
double square(double); // extern declaration
int main() {
return (int)square(5.0); // unresolved until link
}
```
| Symbol | Type | Binding |
|--------|------|---------|
| `_ZN4math6squareEd` | FUNC | GLOBAL |
| `__gxx_personality_v0` | UNDEF | GLOBAL |
**Strong symbols** (definitions) must be unique. **Weak symbols** can be overridden. `UNDEF` means referenced but not yet defined.
```bash
nm -C -g math.o # list exported symbols (demangled)
nm -u main.o # show unresolved (UNDEF) symbols
objdump -t main.o # full symbol table dump
```
---
## 4. Name Mangling
C++ encodes namespaces, class names, and parameter types into symbol names so the linker can distinguish overloads.
```cpp
// C++ overloaded functions → different mangled names
int process(int x); // _Z7processi
int process(double x); // _Z7processd
int process(int x, double y); // _Z7processid
namespace Math {
double sqrt(double x); // _ZN4Math4sqrtEd
}
class Vector {
double dot(const Vector& v); // _ZN6Vector3dotERKS_
};
```
Each compiler uses its own ABI scheme (Itanium ABI on Linux/macOS, MSVC on Windows), so mixing compiler-built objects requires caution.
```bash
# Disable mangling for C interoperability
extern "C" {
int legacy_init(void); // symbol stays: legacy_init
void legacy_free(void*); // symbol stays: legacy_free
}
# Demangle a symbol manually
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
```
---
## 5. Static Linking
The linker copies the needed object files from `.a` archives directly into the executable.
```bash
# Build a static library
ar rcs libmymath.a vec.o mat.o quat.o
# Link statically — no runtime dependencies
g++ main.o -L. -lmymath -static -o app_static
# Verify: no shared lib deps
ldd app_static # → statically linked
```
**Pros:** single self-contained binary, no runtime dependency issues, faster startup.
**Cons:** larger binary, security patches require a full rebuild, code is duplicated across binaries.
> **Note:** Link order matters — list object files before libraries: `g++ main.o -lmymath`, not `g++ -lmymath main.o`.
---
## 6. Dynamic Linking
Shared libraries (`.so` / `.dll` / `.dylib`) are loaded at runtime by the dynamic linker (`ld.so`).
```bash
# Build a shared library (-fPIC is required)
g++ -fPIC -shared vec.o mat.o -o libmymath.so
# Link dynamically (default behavior)
g++ main.o -L. -lmymath -Wl,-rpath,'$ORIGIN' -o app
# Inspect runtime dependencies
ldd app
# libmymath.so => ./libmymath.so
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
# Override a library at runtime (useful for mocking)
LD_PRELOAD=./mock_net.so ./app
```
**PLT/GOT mechanism:** external calls go through the *Procedure Linkage Table* (PLT); the *Global Offset Table* (GOT) holds resolved addresses filled in lazily on first call. Use `-z now` or `BIND_NOW` to resolve all symbols at startup instead.
**Pros:** shared memory between processes, hot-patching by replacing `.so`, smaller binaries.
**Cons:** dependency management ("DLL hell"), slight startup overhead, harder single-file deployment.
---
## 7. Static vs. Dynamic: At a Glance
| Aspect | Static (`.a`) | Dynamic (`.so` / `.dll`) |
|---|---|---|
| Resolution | Link time | Load / runtime |
| Binary size | Larger (code embedded) | Smaller (references only) |
| Memory sharing | No — each process has its own copy | Yes — single copy in RAM |
| Deployment | One self-contained file | Must ship `.so` alongside |
| Hot patching | Full relink required | Replace `.so` and restart |
| Startup overhead | Minimal | Dynamic loader adds ~ms |
| Security updates | Manual rebuild | OS-level update propagates |
---
## 8. Linking with C Libraries
Use `extern "C"` to suppress name mangling when calling C code from C++ (or exposing C++ to C callers).
```cpp
// wrapper.h — expose C++ code to C callers
#pragma once
#ifdef __cplusplus
extern "C" { // disables mangling for these symbols
#endif
void vec_create(void** out);
void vec_destroy(void* vec);
void vec_push(void* vec, double val);
double vec_get(void* vec, int idx);
#ifdef __cplusplus
}
#endif
```
```cpp
// Calling a C library from C++
extern "C" int sqlite3_open(const char*, void**);
```
```bash
g++ main.cpp wrapper.cpp -lsqlite3 -o app
# -l<name> → links libname.so or libname.a
# -L<path> → add directory to library search path
# -Wl,--as-needed → skip libs that aren't actually used
```
**Common pitfalls:**
- Forgetting `extern "C"` → mangled name doesn't match the C header
- C struct padding may differ across compilers, breaking ABI
- C code cannot unwind C++ exceptions — use `noexcept` at boundaries
- Link order still matters: objects first, then libraries
---
## 9. Common Linker Errors
### `undefined reference to 'add(int, int)'`
**Cause:** definition is missing or the library wasn't linked.
**Fix:** add `-lmylib` or include the `.cpp` that defines it.
### `multiple definition of 'globalVar'`
**Cause:** variable defined (not just declared) in a header included by multiple TUs.
**Fix:** use `inline` (C++17), or `extern` declaration in the header + one definition in a `.cpp`.
### `cannot find -lmylib`
**Cause:** linker can't locate `libmylib.so` or `libmylib.a`.
**Fix:** add `-L/path/to/lib`, or set `LD_LIBRARY_PATH` / `PKG_CONFIG_PATH`.
---
## 10. Debugging Linker Issues
```bash
# Inspect symbols
nm -C -g libmath.a # demangled, global symbols only
nm -u main.o # undefined (unresolved) symbols
objdump -d math_lib.o # disassembly
readelf -s math_lib.o # ELF symbol table
# Trace linker decisions
g++ main.o -L. -lmath -Wl,--verbose 2>&1 | grep "attempt"
ld --trace math_lib.o # shows each file the linker considers
# Check shared lib deps
ldd ./app
chrpath -l ./app # show embedded RPATH
# fallback if chrpath is not installed:
readelf -d ./app | grep -E 'RPATH|RUNPATH'
# Demangle a mangled symbol
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
# Pipe nm output through c++filt to demangle all symbols at once
nm -g libmath.a | grep " T " | awk '{print $NF}' | c++filt
```
**Useful flags:**
- `-Wl,--no-undefined` — catch unresolved symbols at build time
- `-Wl,--as-needed` — skip unused shared libraries
- `-Wl,--start-group ... --end-group` — resolve circular dependencies between archives
> *Rule of thumb: "When in doubt, `nm` it out."*
---
## 11. Bonus: Link-Time Optimization (LTO)
Without LTO, the compiler can only optimize within a single translation unit. With LTO, it embeds IR (Intermediate Representation) in `.o` files and performs whole-program optimization at link time.
```bash
# Enable LTO
g++ -flto -O2 -c foo.cpp -o foo.o
g++ -flto -O2 -c main.cpp -o main.o
g++ -flto -O2 foo.o main.o -o app_lto
# Thin LTO — faster, scales to large codebases (clang)
clang++ -flto=thin -O2 *.cpp -o app
```
LTO enables cross-TU inlining, dead code elimination, inter-procedural constant propagation, and whole-program devirtualization — typically 1025% speedup on real codebases.
**Gotcha:** all TUs must be compiled with `-flto`. Third-party archives compiled without it will still link, but that code won't be optimized across boundaries.
---
## Summary & Quick Reference
**Key concepts**
- Linker: resolves symbols, relocates addresses, produces the binary
- Symbol resolution order: strong > weak > UNDEF
- Name mangling encodes C++ type info into flat symbol names
- `extern "C"` disables mangling for C interoperability
**Essential commands**
| Command | Purpose |
|---|---|
| `nm -C -g lib.a` | List exported symbols (demangled) |
| `c++filt <sym>` | Demangle a symbol |
| `ldd app` | Show shared library dependencies |
| `objdump -d obj` | Disassemble object file |
| `ar rcs lib.a *.o` | Create a static library |
| `readelf -s obj` | ELF symbol table |
**Build flags**
| Flag | Effect |
|---|---|
| `-static` | Link everything statically |
| `-fPIC -shared` | Build a position-independent shared library |
| `-Wl,--no-undefined` | Fail at link time on unresolved symbols |
| `-Wl,-rpath,...` | Embed library search path in binary |
| `-flto` | Enable link-time optimization |
| `-Wl,--as-needed` | Only link libraries that are actually used |