Visualizing PostgreSQL Vacuum Progress

Last updated 78 days ago by Dap Dap

sql

As heavy users of PostgreSQL since 2012, we’ve learned quite a bit about operating PostgreSQL at scale. Our Manta object storage system uses a large fleet of sharded, highly-available, replicated PostgreSQL clusters at the heart of the metadata tier. When an end user requests their object, say http://us-east.manta.joyent.com/dap/public/kartlytics/videos/2012-09-06_0000-00.mov, Manta winds up looking in this PostgreSQL cluster for the metadata for that object in order to find the storage servers hosting copies of the object (along with the size, checksum, and other useful information).

From 2012 right up through the present, one of our biggest challenges has been managing PostgreSQL’s vacuum process. I won’t rehash the details here, as there’s plenty of useful information about it. Indeed, managing vacuum is such well-known challenge in operating PostgreSQL that major efforts are underway to fundamentally change the on-disk storage system to eliminate the underlying problem. Since these efforts appear years away (especially for deployments where major PostgreSQL version upgrades will take time to test and orchestrate), it remains important to understand this problem today.

This post assumes that you’re familiar enough with PostgreSQL to understand:

  • The basics of a PostgreSQL database: tables, rows, indexes, and basic SQL statements.
  • The terms database “page” or “block” (which I believe are used interchangeably to refer to 8KiB chunks of files making up the table’s on-disk representation), “heap” (the sequence of pages, potentially stored across several files, that store the actual table data), “tuple” (internal representation of a row), and “index” (an on-disk data structured used for efficient access to rows in the table). For more on these structures, see “Introduction to PostgreSQL physical storage”.
  • Internally, row updates in PostgreSQL are processed similarly to a delete of the original row plus an insert of a new row. Since we’re not concerned about inserts in this post, we can generally just think about deletes.
  • When rows are deleted in PostgreSQL tables, they are not cleaned up right away because doing so while providing desirable ACID semantics (when other transactions might still be able to see the updated or deleted rows) would make the system much more complex (and possibly slow). Deleted or updated rows (tuples) are called “dead tuples”. For more on this, see “Routine Vacuuming” from PostgreSQL documentation.
  • The vacuum process is a long-running database operation that scans the heap and removes dead tuples (i.e., those invalidated by previous “update” or “delete” operations) from both the heap and indexes. Vacuum is also used to address other issues like XID wraparound, which has certainly been a major challenge for us and other providers, but it’s largely unrelated to this post.

In our production deployments, vacuum operations often take at least 5-10 days. For various reasons, it’s not uncommon for them to take 15-20 days, and we’ve seen some take upwards of 40 days. Vacuum has historically been a somewhat opaque process: it runs for a long time until it finishes, at which point it logs some basic information about what happened. Fortunately, recent versions of PostgreSQL include a view describing the progress of ongoing vacuum operations. This is fantastic! The docs are a bit light on interpreting this view, and while you can find posts explaining a bit about how to use the new view, I was left with a lot of questions. Which of the several sets of counters should we be looking at to determine overall progress? Are there even counters for all the parts we care about? Are all the counters expected to reach 100%? How do we know when certain phases will repeat and what can we do about this? This post documents how we’ve visualized the data in the view and what we’ve learned about the underlying process.

Read full Article