Improving the Resilience of HDDs & Ext4

There are numerous guides for squeezing that little bit of extra performance out of HDDs/SSDs - sometimes at the increased risk of losing data in the unfortunate case of a power outage (Ext3/4 mount options: data=writeback and barriers=0 I'm looking at you!). From what I've seen on Linux blogs/tutorials online there doesn't appear to be much written in the terms of improving resilience; possibly because its not as attractive as quicker I/O speeds.


By "resilience" I mean the combination of efforts taken to preserve a working HDD/SSD state be it via backing up the GPT/MBR partition table, filesystem tuning, specific mount options, or S.M.A.R.T testing/reporting. My focus for this guide is on typical HDDs as I feel the likes of a large (1TB+) dedicated backup drives would benefit from improved resilience to data loss or filesystem corruption. Nonetheless simply increasing resilience is not a sure fire solution to data integrity problems - make sure your routine backups have a backups!

If you'd like to skip the explanations and just want the commands then see the TL;DR section at the bottom of this page.


Packages used

gdisk: 0.8.5-1
util-linux: 2.20.1-5.3
smartmontools: 5.41+svn3365-1
e2fsprogs: 1.42.5-1.1

1. Backing up the GPT/MBR Partition Table

Although files are stored in the Ext4 partition it is important to have a backup of the overall disk structure (partition table) as there's very little use in having your data on some filesystem in one of several partitions when your OS can't even detect where that partition (and consequently filesystem) is located!

The manner in which you backup the partition table differs slightly between GPT and MBR.
If you aren't sure which partition table you are currently using then run: sudo fdisk -l /dev/sdX (where X is the physical disk you wish to check)
It will complain: (WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.) if it sees any GPT partition tables on the disk.


[email protected]:~$ sudo sgdisk --backup=/mnt/external/gpt_table_WD1TB.bin /dev/sdb  

The backed up GPT partition file "gpt_table_WD1TB.bin" contains the first 34 LBA (Logical Block Addresses) of the block device /dev/sda as well as including another copy of the GPT table header. GPT does create a backup of its partition entries and its header at the very end of the disk as a redundancy measure, however I would still recommend creating an external backup of the partition table in the unlikely event the duplicate GPT structure is unusable.
For those interested, further details of the overall GPT structure can be found here.

For recovering the GPT partition table you can use the interactive gdisk utility in combination with the "recovery & transformation" option:'r'. Within the recovery submenu you can then specify the GPT table file location in the "load partition data from a backup file" option: 'l'.
Note: The non-interactive sgdisk utility can also be employed to restore a GPT partition table from a file with the '--load-backup=file' argument.


[email protected]:~$ sudo sfdisk --dump /dev/sda > /mnt/external/mbr_table_Seagate1TB.bin  

The MBR partition table utility equivalent sfdisk provides similar functionality to the sgdisk utility for backing up the partition table structure.

Recovery is as simple as using the backed up MBR partition table file as STDIN for the sfdisk command:
sudo sfdisk /dev/sda < /mnt/external/mbr_table_Seagate1TB.bin

2. Ext4 Filesystem Tuning

Before tuning the targeted Ext4 filesystem please ensure that it is not already mounted.

[email protected]:~$ sudo tune2fs -c 5 -i 2W -e remount-ro -O mmp -o journal_data,nodelalloc /dev/sdb1  

Flag & value meanings:
"-c 5 ": Perform a filesystem check every 5 mounts
"-i 2W": Perform a filesystem check every 2 weeks
"-e remount-ro": If mount errors occur then disk is mounted as read only (This also causes e2fsck to be automatically run on the disk at next boot.)
"-O mmp": Enables "Multiple Mount Protection" option therefore preventing the filesystem from being mounted in multiple locations
"-o journal_data": Both metadata and user data is committed to the journal prior to being written into the main filesystem
",nodelalloc": Disables the delayed allocation functionality in Ext4

The values chosen for the filesystem check (using 'e2fsck') lends more to a system that is rebooted on a every-other-day basis as either metric will be quickly surpassed and a forced filesystem check incurred. The remount and mount protection options aren't in themselves too special they just provide more of a protected environment for the disk.

The specific Ext4 mount options journal_data and nodelalloc are the real key tweaks here. Enabling journal_data ensures that all data is first stored in the journal before user data is committed to the filesystem. This allows the I/O operations as well as the data to be replayed in case of interruption (i.e. a power fail) thus keeping the filesystem in a consistent state!

By having the journal_data mount option enabled we implicitly trigger the nodealloc option automatically - I simply wanted to add it explicitly to demonstrate what was occurring. The delayed allocation feature simply delays block allocation from a program's write() command, consequently allowing the block allocator to optimise where it finally places the blocks thus reducing overall fragmentation (on some workloads). By disabling it we do incur the fragmentation performance costs but we do not run the risk of potential data loss.

I've omitted the block_validity mount option tweak because its a debugging focused feature which incurs a larger CPU and memory overhead for its metadata corruption prevention benefits.

3. S.M.A.R.T Configuration & Automation

The majority of modern day harddisks have S.M.A.R.T ( Self-Monitoring, Analysis and Reporting Technology) firmware inbuilt that is able to examine/report/test disk health at the hardware level. By configuring the S.M.A.R.T daemon a substantial insight into the disk's health over time can be ascertained allowing you to take action in advance (if in the unfortunate case the disk is failing).


[email protected]:~$ sudo smartctl --smart=on --offlineauto=on --saveauto=on /dev/sdb  

First off we enable S.M.A.R.T functionality on the disk and pass two more flags (as recommended by the 'smartctl' man page):
"--offlineauto=on": Scans the disk every 4 hours for disk defects and saves the scanned information into the S.M.A.R.T attributes of the disk. Captures information that "online" checks cannot at the cost of a slight performance reduction.
"--saveauto=on": Enabling the saving of device vendor-specific S.M.A.R.T attributes.


# /etc/smartd.conf
/dev/sdb -o on -S on -l error -l selftest -C -s L/../../7/01 -m [email protected] -t -I 194 -I 231 

# Restart daemon
[email protected]:~$ sudo service smartd restart  

Secondly we configure and restart the S.M.A.R.T daemon. There are lots of ambiguous flags in the daemon configuration file, I'll explain what they do now:
"-o on": Enables collection of offline checks and updates the device's S.M.A.R.T attributes (identical to the utility behaviour of smartctl but persists between reboots)
"-S on": Enables saving of device vendor-specific S.M.A.R.T attributes (identical to the utility behaviour of smartctl but persists between reboots)
"-l error": Report (via e-mail) if the number of ATA errors in the S.M.A.R.T summary error log has increased since the last check
"-l selftest": Report (via e-mail) if the number of failed tests in the S.M.A.R.T Self-Test log has increased since the last check
"-C": Report (via e-mail) if the number of pending sectors (unstable/bad sectors requiring reallocation) is non-zero
"-s L/../../7/01": Start a long self test at 1AM on Sunday
"-m [email protected]": The address to send the warning e-mail to. Requires an executable named "mail" in the same PATH variable of the shell/environment (e.g. /bin/mail)
"-t": Tracking the changes of all device S.M.A.R.T attributes (listable with sudo smartctl -h /dev/sdb)
"-I 194 -I 231 ": Ignores device attribute number 192 & 231 when tracking changes in Attribute values. These values correspond to the temperature of the disk which despite only varying between a small, acceptable threshold may alter very often (i.e. get hotter when under sudden load) and consequently result in the sending of multiple worthless e-mails throughout the day.

4. Mount options (fstab)

The two mount options mentioned in "2. Ext4 Filesystem Tuning" are now passed to both the manual invocation of mount and to the /etc/fstab file.

mount (command)

[email protected]:~$ sudo mount -o data=journal,nodelalloc /dev/sdb /mnt/backup  

Manually mount the disk with the resilient parameters passed.


UUID=8d31792e-1f04-4e8d-b7b3-cda75d2b21f8     /mnt/backup     ext4    defaults,data=journal,nodelalloc    0   2  

Using the UUID of the partition (sudo blkid /dev/sdb1) append the mount options to the /etc/fstab file so as to make the mount persistent between reboots. Notice the "2" value in the pass field, this means that the backup drive filesystem is checked after the root disk's.


GPT / MBR Backup

[email protected]:~$ sudo sgdisk --backup=/mnt/external/gpt_table_WD1TB.bin /dev/sdb  # GPT  
[email protected]:~$ sudo sfdisk --dump /dev/sda > /mnt/external/mbr_table_Seagate1TB.bin    # MBR  

Ext4 Tuning

[email protected]:~$ sudo tune2fs -c 5 -i 2W -e remount-ro -O mmp -o journal_data,nodelalloc /dev/sdb1  

S.M.A.R.T Utility Configuration

[email protected]:~$ sudo smartctl --smart=on --offlineauto=on --saveauto=on /dev/sdb    # Live  

S.M.A.R.T Daemon Configuration

# /etc/smartd.conf
/dev/sdb -o on -S on -l error -l selftest -C -s L/../../7/01 -m [email protected] -t -I 194 -I 231 

# Restart daemon
[email protected]:~$ sudo service smartd restart  

Mount Options

[email protected]:~$ sudo mount -o data=journal,nodelalloc /dev/sdb /mnt/backup

# /etc/fstab
UUID=8d31792e-1f04-4e8d-b7b3-cda75d2b21f8     /mnt/backup     ext4    defaults,discard,data=journal,nodelalloc    0   2  

Final Words

Hopefully these configurations can help you with creating a more resilient albeit slower HDD. I'd be interested in running some benchmark tests in the future to see how big the performance penalty is against both default configurations and increased I/O performance configurations. I'm sure there are a few performance tweaks I could apply here (such as noatime in fstab) to improve performance whilst retaining the desired resilience.

If you've spotted any mistakes I've made or you'd like to comment/question any aspect feel free to leave a response in the comment box below.