How Swift and Clang Use LLVM to Read Files into Memory

Last updated 358 days ago by Brian Gesiak


The prior article in this series explained how the Swift and Clang compilers used llvm::SourceMgr to emit diagnostics for source locations in memory buffers, represented by the class llvm::MemoryBuffer. This article focuses on llvm::MemoryBuffer, the primary abstraction for reading files and streams into memory. Since it's used by Swift, Clang, and LLVM tools like llvm-tblgen, I found it valuable to understand how it works.

Reading a file into memory using C++

The documentation for libLLVMSupport's llvm::MemoryBuffer class says it "provides simple read-only access to a block of memory, and provides simple methods for reading files and standard input into a memory buffer." To better understand how it does that, I tried writing a simple C++ program, called read.cpp, that reads a file – itself, in this case – into memory. For simplicity's sake my program is only meant to operate on Unix systems.

My read.cpp program reads a file into memory by using various system calls. These are requests made to the operating system for things like "open a file and give me its file descriptor," or "read 8 bytes from the file with this file descriptor." Julia Evans has a wonderful comic that explains them further:

My read.cpp program uses four system calls:

  1. open(2) to get a file descriptor for the file.
  2. fstat, which returns information about a file descriptor. Specifically, read.cpp allocates memory based on the file's size.
  3. read(2), which reads a given number of bytes from a file into a pre-allocated block of memory.
  4. close(2) to close a file descriptor once I'm done using it.

Once the read.cpp program allocates memory and reads its own source file into that memory, it increments the char * pointer into the memory and prints out the first line of the file: .

Read full Article