Introduction to ZFS

Table of Contents

1 Summary

This document should enable the reader to get started with OpenZFS as it is shipped with illumos1 distributions, FreeBSD2, as an additional installable kernel module for Linux3, MacOS X4 and Windows(experimental)5.

2 What is ZFS?

ZFS is an alternativ to traditional filesystems like UFS/ext2fs/etc. It is also a replacement for Logical Volume Managers and we will discuss the reasons later.

3 Feature Overview

ZFS uses 128bit pointers to address all kinds of objects, be it data, files (inodes), volumes, filesystems, etc. It also checksums all data on your filesystem - not only metadata like traditional filesystems do.

  • 128bit pointers
  • checksums on all objects
  • uses transactions instead of logs
  • all pool and filesystem configurations (including mountpoints) are stored within the pool, which makes it easier to hand over a pool to another system

4 Terms

4.1 Pools

ZFS Pools (short zpools, or just pools) are groups of disks. A disk can be a whole physical disk (recommended), a slice (partition), or a disk image. A zpool uses so called VDEVs (Virtual Devices) to create various RAID Levels.

4.2 VDEV (Virtual Devices)

Virtual Devices are a way of organizing physical devices into logical groups that support various features.

VDEV Explanation
disk a regular block device, a whole disk or a slice
file a image file, use it only for testing
mirror a mirror of two or more devices
raidz (raidz1..3) a variation of RAID5 that distributes data and parity on the set
  raidz1 (or just raidz) is single, raidz2 is double and raidz3 is triple parity
spare specifies hot-spare devices for this pool
log specifies a device (or a mirror) for the ZFS Intent Log
cache specifies one or multiple devices for the Level2 ARC Cache

4.3 ARC

The Adaptive Replacement Cache is a cache type in memory that ZFS uses to cache data and metadata in memory. While traditional filesystem caches can be invalidated (and hence freed) at any given time, the ARC cannot be freed immediatly. On illumos systems the ARC size is dynamic, and it’s maximum is at 75% of the free memory. The size of the ARC can be configured by tuning kernel settings (on-line).

4.4 Cache (L2ARC)

If your memory is not large enough, you can configure a l2arc - or cache - vdev, which provides a cache for asynchronous writes and all reads. This cache should not be on a media with the same speed where your data is on, but on faster disks. For example put a cache vdev on a SSD, while have your actual pool data on spinning disks.

4.5 ZFS Intent Log (ZIL)

All IO operations on ZFS are asynchronously written on the disk. This increases performance for the writes. For applications (like databases) that demand synchronous writes, these are simulated by writing into a special cache called the ZFS Intent Log. If no ZIL was explicitly configured, some disk-space of the ZPool will be taken for it. The recommendation is to set up an explicit ZIL on mirrored SSDs (NVMe preferred at time of this writing). After data was committed into the ZIL, the sync-IO write operations returns for the application as the data is persisted on disk. Later an asynchronous IO job will pick up the ZIL data and transfer it to the pool devices.

5 Command line tools

Since ZFS is always consistent due to its transactional behaviour, there is no “fsck” for ZFS.

5.1 zpool

The zpool(1M) command is used to create and manage ZPools, so the “Volume Manager” side of ZFS.

5.2 zfs

The zfs(1M) command configures all filesystem related objects, also snapshots, clones, etc.

5.3 zdb

The zdb(1M) is the ZFS debugger, which enables you not to modify data, but to inspect on-disk and in-memory data.

6 Basic Usage

6.1 Single disk pool

We will start with a very simple task - we want to set up a single disk called /dev/dsk/c4t2d0 as a zpool, name it as “testpool”, create a filesystem on it and mount it to /testpool.

# zfs create testpool /dev/dsk/c4t2d0

That’s it.

Or as a simple demo, we will use a image (created by mkfile(1) or dd(1)):

(600) x230:/root# mkfile 128M testfile
(601) x230:/root# zpool create testpool /root/testfile
(602) x230:/root# zpool list testpool
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   112M   516K   111M        -         -     2%     0%  1.00x  ONLINE  -
(603) x230:/root# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        testpool          ONLINE       0     0     0
          /root/testfile  ONLINE       0     0     0

errors: No known data errors
(604) x230:/root# df -h /testpool
Filesystem             Size   Used  Available Capacity  Mounted on
testpool              55.8M    23K      55.5M     1%    /testpool
(605) x230:/root# 

6.2 Create a simple mirror

We want to create a similiar pool like in the first example, but it should be redundant to disk failure and we prefer a mirror in this case. Again we use images, this time of course two (or more!).

(605) x230:/root# mkfile 128M mirrfile1 mirrfile2
(607) x230:/root# zpool create mirrpool mirror /root/mirrfile1 /root/mirrfile2
(608) x230:/root# zpool list mirrpool
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
mirrpool   112M  90.5K   112M        -         -     2%     0%  1.00x  ONLINE  -
(609) x230:/root# zpool status mirrpool
  pool: mirrpool
 state: ONLINE
  scan: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        mirrpool             ONLINE       0     0     0
          mirror-0           ONLINE       0     0     0
            /root/mirrfile1  ONLINE       0     0     0
            /root/mirrfile2  ONLINE       0     0     0

errors: No known data errors
(610) x230:/root# df -h /mirrpool
Filesystem             Size   Used  Available Capacity  Mounted on
mirrpool              56.0M    23K      55.9M     1%    /mirrpool

6.3 create an additional filesystem on an existing pool

A pool bundles filesystems, but so far we have not explicitly created an filesystem. When we set up a new pool, we get the root filesystem of this pool implicit created and mounted.

So you can create a new filesystem by simply issuing

(612) x230:/root# zfs create testpool/samplefs
(613) x230:/root# df -h | grep samplefs
testpool/samplefs     56.0M    23K      55.8M     1%    /testpool/samplefs

Creating filesystems looks a bit like just creating directories - but of course they are complete filesystems with unique I-node number ranges, a size, etc.

But we can actually create complete structures of filesystems like mkdir -p creates sub-directory structures:

(614) x230:/root# zfs create -p testpool/struct/sub_a/sub_b/sub_c
(615) x230:/root# df -h | grep sub                               
testpool/struct/sub_a  56.0M    23K      55.7M     1%    /testpool/struct/sub_a
testpool/struct/sub_a/sub_b  56.0M    23K      55.7M     1%    /testpool/struct/sub_a/sub_b
testpool/struct/sub_a/sub_b/sub_c  56.0M    23K      55.7M     1%    /testpool/struct/sub_a/sub_b/sub_c

As we can see here, for every level a new filesystem has been created, also all have been mounted. Also you see that all have the same size and the same free amount of space. This is because we have not set a filesystem size, or quota as we call it. Before we discuss how to change attributes to filesystems we will have a look at the zfs list and zfs get commands.

6.4 inspecting your filesystems

6.4.1 listing filesystems

If you run zfs list without any arguments, it will list all filesystems on your system.

(616) x230:/root# zfs list
NAME                                     USED  AVAIL  REFER  MOUNTPOINT
mirrpool                                  77K  55.9M    23K  /mirrpool
rpool                                    124G   325G    33K  /rpool
rpool/ROOT                              37.9G   325G    23K  legacy
rpool/ROOT/openindiana-10               21.8M   325G  17.5G  /
[...]
rpool/swap                              8.34G   333G   175M  -
testpool                                 250K  55.7M    23K  /testpool
testpool/samplefs                         23K  55.7M    23K  /testpool/samplefs
testpool/struct                           92K  55.7M    23K  /testpool/struct
testpool/struct/sub_a                     69K  55.7M    23K  /testpool/struct/sub_a
testpool/struct/sub_a/sub_b               46K  55.7M    23K  /testpool/struct/sub_a/sub_b
testpool/struct/sub_a/sub_b/sub_c         23K  55.7M    23K  /testpool/struct/sub_a/sub_b/sub_c

You may specify a list of filesystems to list as arguments to zfs list.

(617) x230:/root# zfs list testpool/samplefs testpool/struct/sub_a
NAME                    USED  AVAIL  REFER  MOUNTPOINT
testpool/samplefs        23K  55.7M    23K  /testpool/samplefs
testpool/struct/sub_a    69K  55.7M    23K  /testpool/struct/sub_a

Or you can specify a recursive listing from any filesystem:

(618) x230:/root# zfs list -r testpool/struct/sub_a               
NAME                                USED  AVAIL  REFER  MOUNTPOINT
testpool/struct/sub_a                69K  55.7M    23K  /testpool/struct/sub_a
testpool/struct/sub_a/sub_b          46K  55.7M    23K  /testpool/struct/sub_a/sub_b
testpool/struct/sub_a/sub_b/sub_c    23K  55.7M    23K  /testpool/struct/sub_a/sub_b/sub_c

In all cases the output has 5 columns:

  • NAME: specifies the filesystem name
  • USED: shows the “used” space of the filesystem
  • AVAIL: show the available space
  • REFER: shows the space that is actually referred
  • MOUNTPOINT: shows either the mountpoint, “legacy”, “none” or a “-”

6.4.2 inspecting and setting attributes for a filesystem

We now know how to retrieve a list of filesystems with some very basic attributes. But there are a whole lot more, which we can retrieve with the zfs get command. The basic syntax is zfs get attribute filesystem, a special attribute is “all” which will show us all attributes for the specified filesystem:

(619) x230:/root# zfs get all testpool/struct/sub_a
NAME                   PROPERTY              VALUE                   SOURCE
testpool/struct/sub_a  type                  filesystem              -
testpool/struct/sub_a  creation              Wed Dec 12 20:16 2018   -
testpool/struct/sub_a  used                  69K                     -
testpool/struct/sub_a  available             55.7M                   -
testpool/struct/sub_a  referenced            23K                     -
testpool/struct/sub_a  compressratio         1.00x                   -
testpool/struct/sub_a  mounted               yes                     -
testpool/struct/sub_a  quota                 none                    default
testpool/struct/sub_a  reservation           none                    default
testpool/struct/sub_a  recordsize            128K                    default
testpool/struct/sub_a  mountpoint            /testpool/struct/sub_a  default
testpool/struct/sub_a  sharenfs              off                     default
testpool/struct/sub_a  checksum              on                      default
testpool/struct/sub_a  compression           off                     default
testpool/struct/sub_a  atime                 on                      default
testpool/struct/sub_a  devices               on                      default
testpool/struct/sub_a  exec                  on                      default
testpool/struct/sub_a  setuid                on                      default
testpool/struct/sub_a  readonly              off                     default
testpool/struct/sub_a  zoned                 off                     default
testpool/struct/sub_a  snapdir               hidden                  default
testpool/struct/sub_a  aclmode               discard                 default
testpool/struct/sub_a  aclinherit            restricted              default
testpool/struct/sub_a  createtxg             250                     -
testpool/struct/sub_a  canmount              on                      default
testpool/struct/sub_a  xattr                 on                      default
testpool/struct/sub_a  copies                1                       default
testpool/struct/sub_a  version               5                       -
testpool/struct/sub_a  utf8only              off                     -
testpool/struct/sub_a  normalization         none                    -
testpool/struct/sub_a  casesensitivity       sensitive               -
testpool/struct/sub_a  vscan                 off                     default
testpool/struct/sub_a  nbmand                off                     default
testpool/struct/sub_a  sharesmb              off                     default
testpool/struct/sub_a  refquota              none                    default
testpool/struct/sub_a  refreservation        none                    default
testpool/struct/sub_a  guid                  15442038872638995515    -
testpool/struct/sub_a  primarycache          all                     default
testpool/struct/sub_a  secondarycache        all                     default
testpool/struct/sub_a  usedbysnapshots       0                       -
testpool/struct/sub_a  usedbydataset         23K                     -
testpool/struct/sub_a  usedbychildren        46K                     -
testpool/struct/sub_a  usedbyrefreservation  0                       -
testpool/struct/sub_a  logbias               latency                 default
testpool/struct/sub_a  dedup                 off                     default
testpool/struct/sub_a  mlslabel              none                    default
testpool/struct/sub_a  sync                  standard                default
testpool/struct/sub_a  refcompressratio      1.00x                   -
testpool/struct/sub_a  written               23K                     -
testpool/struct/sub_a  logicalused           34.5K                   -
testpool/struct/sub_a  logicalreferenced     11.5K                   -
testpool/struct/sub_a  filesystem_limit      none                    default
testpool/struct/sub_a  snapshot_limit        none                    default
testpool/struct/sub_a  filesystem_count      none                    default
testpool/struct/sub_a  snapshot_count        none                    default
testpool/struct/sub_a  redundant_metadata    all                     default

As you can see, the “mountpoint” is an attribute, like the “quota”. Not all of these attributes can be set, but we will set now the quota and have a look at the effect.

(620) x230:/root# zfs set quota=32M testpool/struct/sub_a
(623) x230:/root# zfs list -r testpool/struct      
NAME                                USED  AVAIL  REFER  MOUNTPOINT
testpool/struct                      92K  55.7M    23K  /testpool/struct
testpool/struct/sub_a                69K  31.9M    23K  /testpool/struct/sub_a
testpool/struct/sub_a/sub_b          46K  31.9M    23K  /testpool/struct/sub_a/sub_b
testpool/struct/sub_a/sub_b/sub_c    23K  31.9M    23K  /testpool/struct/sub_a/sub_b/sub_c
(624) x230:/root# df -h | grep struct        
testpool/struct       56.0M    23K      55.7M     1%    /testpool/struct
testpool/struct/sub_a    32M    23K      31.9M     1%    /testpool/struct/sub_a
testpool/struct/sub_a/sub_b    32M    23K      31.9M     1%    /testpool/struct/sub_a/sub_b
testpool/struct/sub_a/sub_b/sub_c    32M    23K      31.9M     1%    /testpool/struct/sub_a/sub_b/sub_c

This filesystems and it’s children have now an upper quota of total 32M.

Footnotes:

Date: 2018-12-11

Author: Olaf Bohlen

Created: 2018-12-12 Wed 20:57

Validate