Raspbian: running the system from RAID

January 18, 2017

Raspbian: running the system from RAID

Filed under: linux — SiKing @ 11:11 am
Tags: howto

SD card failures are a well-documented (and complained about) phenomenon on the Raspberry Pi. One of my clients have a product that runs on an RPi 24/7, and they tasked me to do something about this. I looked at several different things, and I am going to publish my findings here for (hopefully) the benefit of others.

So why do SD cards fail? To put it simply: each write generates a certain amount of heat; enough writes / heat will burn up the card.

In order to lower the chance of failure, there are certain things that can be done to the system. This discussion is a good starting point; this wiki is little more in-depth.

Some things that I looked into, but do not have enough information to share here, are:

Industrial-grade SD cards: usually come with some warranty, cost more money.
Moving logging to memory (tmpfs): tricky balance between how much memory you have versus how much memory you need for your app, and also how much information you need in case things crash as you will lose everything in memory.

I configured two things on my system: 1) ext2 file-system, and 2) RAID.

Preliminaries

The default Raspbian OS image creates the following partitions:

$ sudo parted /dev/mmcblk0 print free
Model: SD SD16G (sd/mmc)
Disk /dev/mmcblk0: 15.5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
        32.3kB  4194kB  4162kB           Free Space
 1      4194kB  70.3MB  66.1MB  primary  fat16        lba
 2      70.3MB  4019MB  3949MB  primary  ext4
        4019MB  15.5GB  11.5GB           Free Space

Note that in this example the disk-blocks (“Sector size”) match, which is desirable. However, some SD cards are manufactured with physical block size of 1024 bytes, or more. In this case, the blocks should be aligned.

boot partition

The first block of Free space is just wasted space?!?!

The second block (“Number 1”) is mounted as /boot and is a FAT partition. The Raspberry Pi is able to boot only from the internal SD card, and only from a FAT partition; citation. This is a limitation of the Raspberry SOC architecture and cannot be changed.

Even at run-time, this partition is accessed. I tried just mounting the partition as read-only, but this caused the system to fail to boot. I still want to figure out why (and what) this partition is being written to at run-time.

root partition

The next block (“Number 2”) is mounted as / (root) and holds the entire OS. This partition is mostly vanilla ext4.

The only restriction on the partition that holds the operating system, is that the file-system must support user permissions – most Linux file-systems do. The default ext4 system uses a journal which causes additional writes to the disk. This is intended to prevent file corruption in case of a catastrophe like power loss during a write. The SD card is fast enough that a chance of such a catastrophe is minimal, and the extra writes cause additional wear on the card. After the base image is created the journal can be removed, or the partition reformatted as ext2, which does not have a journal. Note that if the journal is removed from an ext4 file-system, some tools will actually report that partition as ext2.

free space

The last block of Free Space is probably intended for user apps? When the RPi is booted the first time, there is a utility that automatically expands the root partition into all available free space.

This partition simply needs to be formatted and added to /etc/fstab, so it can be used as a /app partition. Once this partition is created, the first-time utility that expands the root partition will run and fail; the error can be safely ignored.

swap

A swap partition is conspicuously missing. The RPi uses a swap file. A swap file, versus a swap partition, can be easily reconfigured at run-time: turned on or off, resized, or even removed allowing for the space to be reclaimed by the system. dphys-swapfile controls the swap. Note that by default the vm.swappiness parameter is set to 1 (the lowest possible).

I have given some consideration to completely turning off the swap and removing it. However, I am not convinced this is a good idea.

If the system does not need to swap, then the swap file is never written to. So just the presence of a swap file causes no additional wear on the SD card.
And if the system does actually need to swap out, and no swap is available, the system will crash.

RAID system

We can install one or more additional SD cards (using a USB to mini SD adapter) on the RPi and configure them as a RAID system, where the additional card will be an exact duplicate of the first; this is a real-time function and should not be confused with a backup! In other words: if a file is erased from the system, it will be erased from both copies. In case one of the cards fails, the system can still operate on the one remaining card, ideally warning the user that the failed card should be replaced.

The math to calculate the odds of complete system failure is extensive. Using one additional SD card gives us only the possibility of mirroring the two cards (RAID1). Using two additional cards gives us the option of mirroring all of them (still RAID1), or using a more advanced system (RAID5), where the data is written across the cards. RAID5 also gives a slight performance boost – you are able to read or write almost twice as fast. Since you never get something for free, configuring three cards as a RAID5 you can lose only one; configuring three cards as a RAID1 you can lose up to two.

This setup still has one point of failure: the primary boot partition. We can replicate the boot partition using scripts, but the Raspberry Pi SOC is only able to boot from the one card that is plugged into the SD slot on the motherboard. If this partition has a failure, the entire system is dead.

What you are going to need

Two (2) identically formatted SD cards; at least one of them has to hold the Raspbian OS. In order to ensure the cards are identical, I just installed Raspbian on both of them.
One (1) USB to microSD card adapter; you can pick this up almost anywhere, but be careful: the cheaper models make lower quality electrical connections, and any slight bump might “dislodge” the card and degrade the array.

Procedure

I am a big believer in predictable and reproducible. The individual scripts are numbered (in the filename), and are intended to be run in numerical order. Each individual script accomplishes one specific task. This makes the entire procedure modular, and can easily be modified and experimented with. Each script contains checks for any assumptions, and everything is documented inline. All the scripts have to be run with elevated privileges: sudo ./<scriptname>.

The first couple of scripts must be run on a workstation.

01-write_image_to_card [source]
02-create_app_partition [source] – If you are not interested in having a separate /app partition, then you will need to boot both cards separately so that the free space is expanded the same.
03-tune_root_filesystem [source] – If you are building a RAID system, this script can safely be skipped.

If you are building a RAID array, then rerun the above scripts against the second SD card.

To continue, you now need to boot from one card on the RPi. The Raspbian OS has a one-time startup utility that automatically expands the root partition into all available space. If the /app partition was created, this utility will fail; the error can be safely ignored.

It is probably a good idea to upgrade the OS at this point. One of the later scripts (04) will tie the RPi to the current running version of the kernel.

$ sudo apt-get update
$ sudo apt-get dist-upgrade
$ sudo reboot

To continue, you now need to plug the second card into the USB to microSD adapter. The next several scripts will configure the RAID system on the RPi. The second card must have the exact same partitioning as the primary. Currently I do not have a script that reproduces this; I just cheated by installing Raspbian on both cards – the first three scripts above.

04-configure_initramfs [source] – This script will tie the RPi to the current kernel. Upgrading the kernel after this, also requires upgrading the initramfs.
05-configure_root_raid_pass1 [source] – Configuring the root partition RAID requires a restart in the middle of the process; hence the “pass1” and “pass2”.
05-configure_root_raid_pass2 [source] – After this script, you can monitor the RAID build with watch -n 5 cat /proc/mdstat. It’s a good idea to let this complete before continuing.
06-configure_app_raid [source] – If you are not interested in having an /app partition, then just skip this.
07-setup_maintenance [source] – The RAID should be monitored periodically. sync_boot_partition [source], raid_check [source]

SiKing

January 18, 2017