How Swift and Clang Use LLVM to Read Files into Memory

Last updated 1 year ago by Brian Gesiak

swift

The prior article in this series explained how the Swift and Clang compilers used llvm::SourceMgr to emit diagnostics for source locations in memory buffers, represented by the class llvm::MemoryBuffer. This article focuses on llvm::MemoryBuffer, the primary abstraction for reading files and streams into memory. Since it's used by Swift, Clang, and LLVM tools like llvm-tblgen, I found it valuable to understand how it works.

Reading a file into memory using C++

The documentation for libLLVMSupport's llvm::MemoryBuffer class says it "provides simple read-only access to a block of memory, and provides simple methods for reading files and standard input into a memory buffer." To better understand how it does that, I tried writing a simple C++ program, called read.cpp, that reads a file – itself, in this case – into memory. For simplicity's sake my program is only meant to operate on Unix systems.

My read.cpp program reads a file into memory by using various system calls. These are requests made to the operating system for things like "open a file and give me its file descriptor," or "read 8 bytes from the file with this file descriptor." Julia Evans has a wonderful comic that explains them further:

My read.cpp program uses four system calls:

  1. open(2) to get a file descriptor for the file.
  2. fstat, which returns information about a file descriptor. Specifically, read.cpp allocates memory based on the file's size.
  3. read(2), which reads a given number of bytes from a file into a pre-allocated block of memory.
  4. close(2) to close a file descriptor once I'm done using it.

Once the read.cpp program allocates memory and reads its own source file into that memory, it increments the char * pointer into the memory and prints out the first line of the file: .

Read full Article