Integrating FreeBSD, ZFS, and Periodic; snapshots and scrubs

April 15th, 2009

Update: the scripts/process outlined here has been used as the foundation for zfs-periodic in the FreeBSD ports collection, you can check it out here.

ZFS on FreeBSD is powerful, especially when coupled with periodic taking hourly, daily, weekly, monthly, … snapshots. In the following post I’ll provide the scripts & config necessary to customize and walk you step-by-step through setting up zfs snapshots and scrubs with periodic on FreeBSD.

Periodic’s main advantage over the more traditional and obvious method of running a script from a cron job is integration with the notification emails and standard configuration mechanism. That may not sound like much, but that means you a year down the road (or someone else that comes to the system) only has to look in the obvious place to figure out what’s going on or make changes.

This is going to be a long post, but there’s a decent amount of code & config to walk through. The files being discussed have been tar’d up and the latest version of them can be downloaded from here.

Configuration – the stuff you might actually want to muck with

We’ll start with the configuration (/etc/periodic.conf) as it’s the most relevant portion or at least the most likely to be edited. Out of the box FreeBSD supports daily, weekly, and monthly periodic tasks, we’re going to be adding an hourly so that we can do hourly snapshots. The first section of config sets up who output from the hourly script should go to, whether it should be sent if everything succeeded, if something failed, or if something is mis-configured. Hourly emails seem a bit much so we’ve disabled them when everything goes well. We do want messages about errors and I’ve just left badconfig to the same value as all of the other time-frames.

# Hourly options
hourly_output="root"					# user or /file
hourly_show_success="NO"				# scripts returning 0
hourly_show_info="YES"					# scripts returning 1
hourly_show_badconfig="NO"				# scripts returning 2

The next section configures hourly snapshots. In this case we’re enabling them for the pool tank and keeping the 6 most recent around. There are defaults, that we’ll see later, for both pools and keep so the only required value here is enable. To specify more than one pool add them space seperated to the config string, e.g. “tank boat plane”

# 000.zfs-snapshot
hourly_zfs_snapshot_enable="YES"
hourly_zfs_snapshot_pools="tank"
hourly_zfs_snapshot_keep=6

The daily section is almost identical, but we instead keep the last 7 days. We’re also enabling a daily zfs status script that is in the default setup, but disabled.

# Daily options

# 000.zfs-snapshot
daily_zfs_snapshot_enable="YES"
daily_zfs_snapshot_pools="tank"
daily_zfs_snapshot_keep=7

# 404.status-zfs
daily_status_zfs_enable="YES"

Weekly and Monthly have the same configuration options, we’re keeping the last 5 weeks, and last 2 months below.

# Weekly options

# 000.zfs-snapshot
weekly_zfs_snapshot_enable="YES"
weekly_zfs_snapshot_pools="tank"
weekly_zfs_snapshot_keep=5

# Monthly options

# 000.zfs-snapshot
monthly_zfs_snapshot_enable="YES"
monthly_zfs_snapshot_pools="tank"
monthly_zfs_snapshot_keep=2

A final section configures the monthly scrubbing. Similarlly to the snapshot config sections there’s an enable line and pools line. Here we’ve enabled the monthly scrub on the tank pool.

# 998.zfs-scrub
monthly_zfs_scrub_enable="YES"
monthly_zfs_scrub_pools="tank"

periodic hourly – adding hourly script support to periodic

The next thing we need to do is add hourly support to periodic. While that might sound complicated it’s actually very straightforward. We start by creating a directory to house the hourly files.

# mkdir /etc/periodic/hourly

And then add the following line to /etc/crontab just before the line for ‘periodic hourly’

1	*	*	*	*	root	periodic hourly

That’s it you now have a place to put scripts that will be run every hour on the :01.

hourly/daily/weekly/monthly scripts – adding hourly script support to periodic

Now we’ll get to the scripts that make all of this configuration do something. It’s unlikely that you’ll ever have to much with any of these, but in case your curious I’ll go ahead and walk though them. We’ll start with the hourly snapshot script (/etc/periodic/hourly/000.zfs-snapshot.)

 1  #!/bin/sh
 2
 3  # If there is a global system configuration file, suck it in.
 4  #
 5  if [ -r /etc/defaults/periodic.conf ]
 6  then
 7      . /etc/defaults/periodic.conf
 8      source_periodic_confs
 9  fi
10
11  pools=$hourly_zfs_snapshot_pools
12  if [ -z "$pools" ]; then
13      pools='tank'
14  fi
15
16  keep=$hourly_zfs_snapshot_keep
17  if [ -z "$keep" ]; then
18      keep=6
19  fi
20
21  case "$hourly_zfs_snapshot_enable" in
22      [Yy][Ee][Ss])
23          . /etc/periodic/zfs-snapshot
24          do_snapshots "$pools" $keep 'hourly'
25          ;;
26      *)
27          ;;
28  esac

Lines 3-9 is boilerplate periodic script stuff. 11-19 look for the values we configured earlier and use defaults if they’re not specified. 21, 22, 25, and 26 are case shell scripting case statement stuff that’s borrowed from one of the other periodic scripts, mainly just makes sure that hourly_zfs_snapshot_enable is set to YES, ignoring case. Line 23 pulls in (think #include) some common zfs snapshotting code that we’ll get to next and finally line 24 calls the snapshotting function for the configured pools, keep count, and the type of hourly. The daily, weekly, and monthly scripts are identical with hourly replaced with the appropriate value throughout.

zfs-snapshot – the workhorse

There’s too much here to walk though in detail so I’ll let you read through the code. I’ve tried to do a decent job of in-line commenting. If you have questions or want clarification feel free to ask…

#!/bin/sh

# checks to see if there's a scrub in progress
scrub_in_progress()
{
  pool=$1

  if zpool status $pool | grep "scrub in progress" > /dev/null; then
    return 0
  else
    return 1
  fi
}

# take the appropriately named snapshot
create_snapshot()
{
    pool=$1

    case "$type" in
        hourly)
        now=`date +"$type-%Y-%m-%d-%H"`
        ;;
        daily)
        now=`date +"$type-%Y-%m-%d"`
        ;;
        weekly)
        now=`date +"$type-%Y-%U"`
        ;;
        monthly)
        now=`date +"$type-%Y-%m"`
        ;;
        *)
        echo "unknown snapshot type: $type"
        exit 1
    esac

    # create the now snapshot
    snapshot="$pool@$now"
    # look for a snapshot with this name
    if zfs list -H -o name | sort | grep "$snapshot$" > /dev/null; then
        echo "	snapshot, $snapshot, already exists"
    else
        echo "	taking snapshot, $snapshot"
        zfs snapshot -r $snapshot
    fi
}

# delete the named snapshot
delete_snapshot()
{
    snapshot=$1
    echo "	destroying old snapshot, $snapshot"
    zfs destroy -r $snapshot
}

# take a type snapshot of pool, keeping keep old ones
do_pool()
{
    pool=$1
    keep=$2
    type=$3

    # create the regex matching the type of snapshots we're currently working
    # on
    case "$type" in
        hourly)
        # hourly-2009-01-01-00
        regex="$pool@$type-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[0-9][0-9]$"
        ;;
        daily)
        # daily-2009-01-01
        regex="$pool@$type-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$"
        ;;
        weekly)
        # weekly-2009-01
        regex="$pool@$type-[0-9][0-9][0-9][0-9]-[0-9][0-9]"
        ;;
        monthly)
        # monthly-2009-01
        regex="$pool@$type-[0-9][0-9][0-9][0-9]-[0-9][0-9]"
        ;;
        *)
        echo "unknown snapshot type: $type"
        exit 1
    esac

    create_snapshot $pool $type

    # get a list of all of the snapshots of this type sorted alpha, which
    # effectively is increasing date/time
    # (using sort as zfs's sort seems to have bugs)
    snapshots=`zfs list -H -o name | sort | grep $regex`
    # count them
    count=`echo $snapshots | wc -w`
    if [ $count -ge 0 ]; then
        # how many items should we delete
        delete=`expr $count - $keep`
        count=0
        # walk through the snapshots, deleting them until we've trimmed deleted
        for snapshot in $snapshots; do
            if [ $count -ge $delete ]; then
                break
            fi
            delete_snapshot $snapshot
            count=`expr $count + 1`
        done
    fi
}

# take snapshots of type, for pools, keeping keep old ones,
do_snapshots()
{
    pools=$1
    keep=$2
    type=$3

    echo ""
    echo "Doing zfs $type snapshots:"
    for pool in $pools; do
        if scrub_in_progress $pool; then
          echo "	skipping snapshot of $pool, scrub in progress"
        else
          do_pool $pool $keep $type
        fi
    done
}

ZFS snapshots — poor man’s backup — solution to the ‘rm *’ whoops problem

April 4th, 2009

Daily/Hourly Snapshots Script

First off snapshots != backups, but they’re still really useful. A solution to the whoops I didn’t mean to do that problem. Similar to the benefits of having a delayed slave with mysql, if you accidentally do something to mess up your world/data you can go back in time a little bit and “undo” it. ZFS gives you great tools for doing file system snapshots and makes recovering from problems possible. I use the following simple script I coded up on my newly built FreeBSD 7.0 full ZFS system.

#!/bin/sh

ZFS=zfs
POOL=tank

# i run this in a cron tab with output redirected to a
# log file, this gives me a ran at time
date

# delete the last hourly snapshot, the old one that
# should go away, in this case 2 hours ago
LASTHOUR=`date -v-2H +"%Y-%m-%d-%H"`
echo "deleting hourly snapshot $LASTHOUR of $POOL"
$ZFS destroy -r $POOL@$LASTHOUR
# create the new snapshot, this hour
HOURLY=`date +"%Y-%m-%d-%H"`
echo "taking hourly snapshot $HOURLY of $POOL"
$ZFS snapshot -r $POOL@$HOURLY

# at 12:00 utc, ~ 4am local time, i run this server in utc
if [ `date +"%H"` = '12' ]; then
  # same as LASTHOUR, but 1 week ago
  LASTWEEK=`date -v-1w +"%Y-%m-%d"`
  echo "deleting daily snapshot $LASTWEEK of $POOL"
  $ZFS destroy -r $POOL@$LASTWEEK
  # todays
  DAILY=`date +"%Y-%m-%d"`
  echo "taking daily snapshot $DAILY of $POOL"
  $ZFS snapshot -r $POOL@$DAILY
fi

It takes hourly snapshots, keeping around the last 2, and daily snapshots keeping around the last 7. That’s good enough to suit my purposes, but the script could easily be appended to take more frequent snapshots or to do longer term snapshotting: weekly, monthly, …

Disk Usage

One awesome thing is that if you have data that is added to, never taken away, snapshots won’t take up any extra space. If you make changes then both copies of the data will need to be stored and thus double the space will be required. Deleting, won’t free up the space so long as the snapshot lives. With my use-case I see about 3% overhead for snapshots on my frequently used pieces and nearly 0% everywhere else.

Viewing Current Usage

Useful commands for taking a peek at snapshots and space usage:

# zfs list -t filesystem
# zfs list

The first will show only filesystems, no snapshots etc. It’s similar in purpose to df. The second will show all filesystem and snapshot usage. This one tells you how much space each is taking up. If you have evenly spaced snapshots over time it can give you an idea how much churn you’re seeing.

An example of the output from zfs list for one of my filesystems and it’s snapshots

tank/media                54.7G  2.80T  47.9G  /media/media
tank/media@2009-03-29         0      -  46.3G  -
tank/media@2009-03-30         0      -  46.3G  -
tank/media@2009-03-31       43K      -  47.5G  -
tank/media@2009-04-01     1.20G      -  49.3G  -
tank/media@2009-04-02      350M      -  46.5G  -
tank/media@2009-04-03       52K      -  47.4G  -
tank/media@2009-04-04       48K      -  47.1G  -
tank/media@2009-04-05-04      0      -  47.9G  -
tank/media@2009-04-05-05      0      -  47.9G  -

Reading through the above the overall usage is at 54.7G. Nothing has changed in the past 2 hours. Tiny little bits of data have changed in the last two days and the two days before that saw decent size chunks changing at 350M and 1.2G. This is a pretty common pattern for this filesystem for me. The snapshots give me a place to go when I accidentally delete a file or make an unintended change.

Getting At Your Snapshot Data

Oh, one more thing… You can get at the snapshots using the .zfs directory. So if the tank/media filesystem was mounted at /media/ you’d find yesterday’s snapshot at /media/.zfs/snapshots/2009-04-03.