Using dm-cache with a RAM disk
Mark Elvers
3 min read

Categories

  • dm-cache,Ubuntu

Tags

  • tunbury.org

I have written about dm-cache previously, when I used it with LVM for SSD/HDD caching. In this post, I will explore using dm-cache with RAM as the cache layer over a spinning disk.

I have a CI workload that I could almost fit entirely in tmpfs, but then I would not have any data persistence across reboots. I also have existing data on disk, which I’d rather not regenerate.

To use any cache we need a block store and a meta datastore. As I mentioned in the previous post the metadata is typically 1% of the size of the block store. Since empty RAM disks take up any space, I’ll create two RAM disks for 100G, one for metadata and one for the block store. Equally, I could have partitioned a single RAM disk.

modprobe brd rd_size=107374182400 rd_nr=2 max_part=1

Let’s configure these with dmsetup with the sizes given in 512-byte sectors.

dmsetup create cache-meta --table "0 2097152 linear /dev/ram0 0"
dmsetup create cache-data --table "0 209715200 linear /dev/ram1 0"

There is a lot of outdated information online about the cache settings. Firstly, there are references to a default policy, a Stochastic Multiqueue (SMQ) policy and a Multiqueue (MQ) policy. However, the kernel logs show that the “mq policy is now an alias for smq”. Many of the configuration options, such as write_promote_adjustment and read_promote_adjustment, have been removed: “tunable ‘write_promote_adjustment’ no longer has any effect”.

This leaves migration_threshold as about the only tunable setting. It controls the minimum activity level a block needs before SMQ considers promoting it to cache. I’ve picked 100 to move blocks into the cache aggressively rather than the conservative default of 2048.

There is the choice between writeback and writethrough, but since I want performance over data integrity, I have selected writeback, which asynchronously writes the data back to the disk. I can easily regenerate the data if it is lost.

The final question is the block size. I initially selected 8 sectors (4KB blocks), but the kernel rejected this with an “Invalid data block size” message. The smallest size I could use was 64 sectors, and even with that, the kernel warns about excess memory usage. Larger blocks reduce memory overhead and potentially improve performance, but reduce granularity for small random writes. 256 sectors does not give me a warning, so I have selected that.

Below is my final command. smq 2 means that there are two parameters after it.

dmsetup create fast-sdd --table "0 $(blockdev --getsz /dev/sdd) cache /dev/mapper/cache-meta /dev/mapper/cache-data /dev/sdd 256 1 writeback smq 2 migration_threshold 100"

Finally, mount the new device. This assumes /dev/sdd had a filesystem on it; if not, make one in the usual way mkfs /dev/mapper/fast-sdd.

mount /dev/mapper/fast-sdd /mnt

We can view the statistics from dmsetup status, but its string of numbers needs improvement!

while true; do
  dmsetup status fast-sdd | awk '{
    split($7, cache, "/")
    printf "Cache Usage: %d/%d blocks (%.1f%%)\n", cache[1], cache[2], (cache[1]/cache[2])*100
    printf "Read Hits: %d, Misses: %d (%.1f%% hit rate)\n", $8, $9, ($8/($8+$9))*100  
    printf "Write Hits: %d, Misses: %d (%.1f%% hit rate)\n", $10, $11, ($10/($10+$11))*100
    printf "Dirty blocks: %d\n", $14
    printf "Metadata usage: %s\n", $5
    printf "Promotions: %d, Demotions: %d\n\n", $13, $12
  }';
  sleep 2;
done

There are some impressive hit rates, albeit, remembering that my dataset does fit within the cache.

Cache Usage: 922060/6553600 blocks (14.1%)
Read Hits: 98508, Misses: 864 (99.1% hit rate)
Write Hits: 3515392, Misses: 116691 (96.8% hit rate)
Dirty blocks: 897985
Metadata usage: 19416/262144
Promotions: 922045, Demotions: 0

When it is time to shut down the machine, special care needs to be taken to write the dirty blocks out to disk. The process requires setting the device policy to cleaner, which requires a suspend/resume operation.

After dmsetup suspend and dmsetup reload, the table shown with dmsetup table remains unchanged. The new table does not take effect until dmsetup resume.

dmsetup wait pauses until all the dirty blocks have been written.

umount /dev/mapper/fast-sdd
dmsetup suspend fast-sdd
dmsetup reload fast-sdd --table "0 $(blockdev --getsz /dev/sdd) cache /dev/mapper/cache-meta /dev/mapper/cache-data /dev/sdd 256 0 cleaner 0"
dmsetup resume fast-sdd
dmsetup wait fast-sdd
dmsetup remove fast-sdd
dmsetup remove cache-meta
dmsetup remove cache-data
rmmod brd