Wednesday, March 29, 2017

Change autovacuum_freeze_max_age without a restart (sort of…)

This blog post is kind of involved, so I'm giving a short version at the top, with some background for beginners at the bottom. The middle section explains the motivation for using this hack in the first place.

Short version

I came up with a useful and/or terrible hack the other day: setting autovacuum_freeze_max_age as a storage parameter. I definitely don't recommend doing this routinely, but it unblocked us during a critical maintenance window.

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

Don't forget to set it back when you're done! Otherwise you will incur an even longer autovacuum freeze, probably when you least expect it.

Medium-length version

My colleague Kacey Holston was in the midst of upgrading a client from PostgreSQL 9.4 to 9.6, using Slony for minimal downtime. As planned, the client took a few minutes of downtime so Kacey could do. She was ready to reverse the direction of replication (so the 9.6 server was replicating to the 9.4 server, in case our client to fall back to it). But there was an autovacuum freeze (a.k.a. "autovacuum (to prevent wraparound)" that was keeping Slony from getting the brief ExclusiveLock it needed.

She knew from experience that this table takes three hours to freeze. But the client had only minutes of downtime scheduled – that was the whole point of using Slony!

If only it were possible to change autovacuum_freeze_max_age on the fly; then we could bump it up to stop that autovacuum. Unfortunately, you have to restart the database in order to change it. Except…

You can set it on a per-table basis, as follows. This took effect immediately:

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

If you do this, don't forget to set it back to the normal value (by default, 200000000) once you're done! Otherwise autovacuum freezes on this table will come around less often and take even longer.

Background for beginners:

When the oldest transaction ID on any row in a table is more than autovacuum_freeze_max_age old (200 million transaction old, by default), then an "autovacuum (to prevent wraparound)" process runs on the table to reclaim old transaction IDs. For large tables, this can be a problem, because it can generate a lot of CPU and I/O activity during busy hours. Also, as we saw here, it locks the table (in a SHARE UPDATE EXCLUSIVE mode); this blocks DDL changes (a.k.a. migrations).

For more-technical background, see the official PostgreSQL docs on how transaction IDs work, and for a friendlier intro, see this series of blog posts.