Getting Started with LLVM Core Libraries
上QQ阅读APP看书,第一时间看更新

Understanding LLVM today

Nowadays, the LLVM project has grown and holds a huge collection of compiler-related tools. In fact, the name LLVM might refer to any of the following:

  • The LLVM project/infrastructure: This is an umbrella for several projects that, together, form a complete compiler: frontends, backends, optimizers, assemblers, linkers, libc++, compiler-rt, and a JIT engine. The word "LLVM" has this meaning, for example, in the following sentence: "LLVM is comprised of several projects".
  • An LLVM-based compiler: This is a compiler built partially or completely with the LLVM infrastructure. For example, a compiler might use LLVM for the frontend and backend but use GCC and GNU system libraries to perform the final link. LLVM has this meaning in the following sentence, for example: "I used LLVM to compile C programs to a MIPS platform".
  • LLVM libraries: This is the reusable code portion of the LLVM infrastructure. For example, LLVM has this meaning in the sentence: "My project uses LLVM to generate code through its Just-in-Time compilation framework".
  • LLVM core: The optimizations that happen at the intermediate language level and the backend algorithms form the LLVM core where the project started. LLVM has this meaning in the following sentence: "LLVM and Clang are two different projects".
  • The LLVM IR: This is the LLVM compiler intermediate representation. LLVM has this meaning when used in sentences such as "I built a frontend that translates my own language to LLVM".

To understand the LLVM project, you need to be aware of the most important parts of the infrastructure:

  • Frontend: This is the compiler step that translates computer-programming languages, such as C, C++, and Objective-C, into the LLVM compiler IR. This includes a lexical analyzer, a syntax parser, a semantic analyzer, and the LLVM IR code generator. The Clang project implements all frontend-related steps while providing a plugin interface and a separate static analyzer tool to allow deep analyses. For details, you can go through Chapter 4, The Frontend, Chapter 9, The Clang Static Analyzer, and Chapter 10, Clang Tools with LibTooling.
  • IR: The LLVM IR has both human-readable and binary-encoded representations. Tools and libraries provide interfaces to IR construction, assembling, and disassembling. The LLVM optimizer also operates on the IR where most part of optimizations is applied. We explain the IR in detail in Chapter 5, The LLVM Intermediate Representation.
  • Backend: This is the step that is responsible for code generation. It converts LLVM IR to target-specific assembly code or object code binaries. Register allocation, loop transformations, peephole optimizers, and target-specific optimizations/transformations belong to the backend. We analyze this in depth in Chapter 6, The Backend.

The following diagram illustrates the components and gives us an overview of the entire infrastructure when used in a specific configuration. Notice that we can reorganize the components and utilize them in a different arrangement, for example, not using the LLVM IR linker if we do not want to explore link-time optimizations.

The interaction between each of these compiler parts can happen in the following two ways:

  • In memory: This happens via a single supervisor tool, such as Clang, that uses each LLVM component as a library and depends on the data structures allocated in the memory to feed the output of a stage to the input of another
  • Through files: This happens via a user who launches smaller standalone tools that write the result of a particular component to a file on disk, depending on the user to launch the next tool with this file as the input

Hence, higher-level tools, such as Clang, can incorporate the usage of several other smaller tools by linking together the libraries that implement their functionality. This is possible because LLVM uses a design that emphasizes the reuse of the maximum amount of code, which then lives in libraries. Moreover, standalone tools that incarnate a smaller number of libraries are useful because they allow a user to interact directly with a specific LLVM component via the command line.

For example, consider the following diagram. We show you the names of tools in boxes in boldface and libraries that they use to implement their functionality in separated boxes in regular font. In this example, the LLVM backend tool, llc, uses the libLLVMCodeGen library to implement part of its functionality while the opt command, which launches only the LLVM IR-level optimizer, uses another library—called libLLVMipa—to implement target-independent interprocedural optimizations. Yet, we see clang, a larger tool that uses both libraries to override llc and opt and present a simpler interface to the user. Therefore, any task performed by such higher-level tools can be decomposed into a chain of lower-level tools while yielding the same results. The next sections illustrate this concept. In practice, Clang is able to carry on the entire compilation and not just the work of opt and llc. That explains why, in a static build, the Clang binary is often the largest, since it links with and exercises the entire LLVM ecosystem.