There are Ansible playbooks available at ceph/cephadm-ansible to configure CephFS; however, I decided to set it up manually on some test VMs to gain a better understanding of the process.
I used Vagrant to create a couple of VMs. One with 3 x 500GB disks and one with 11 x 1TB disks.
Vagrant.configure("2") do |config|
config.vm.box = "generic/ubuntu2204"
config.vm.provider "libvirt" do |v|
v.memory = 8192
v.cpus = 4
(1..3).each do |i|
v.storage :file, :size => '500G'
end
end
config.vm.network :public_network, :dev => 'br0', :type => 'bridge'
end
After vagrant up, SSHed to the 3 disk node, which I will use to bootstrap the cluster. Install the cephadm tool, which installs docker.io and other packages needed.
apt install cephadm
Set the hostname and run cephadm:
hostnamectl set-hostname host226.ocl.cl.cam.ac.uk
cephadm bootstrap --mon-ip 128.232.124.226 --allow-fqdn-hostname
After that completes the admin interface is available on port 8443 and the initial password is given. This needs to be changed on first login.
Ceph Dashboard is now available at:
URL: https://host226.ocl.cl.cam.ac.uk:8443/
User: admin
Password: 6n2knvhka0
Enabling client.admin keyring and conf on hosts with "admin" label
Saving cluster configuration to /var/lib/ceph/8c498470-b01f-11f0-8941-1baf58a32558/config directory
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
sudo /usr/sbin/cephadm shell --fsid 8c498470-b01f-11f0-8941-1baf58a32558 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
Or, if you are only running a single cluster on this host:
sudo /usr/sbin/cephadm shell
Please consider enabling telemetry to help improve Ceph:
ceph telemetry on
For more information see:
https://docs.ceph.com/docs/master/mgr/telemetry/
Bootstrap complete.
To run ceph commands either run cephadm shell -- ceph -s run them interactively after first running cephadm shell.
On the other, 11 disk machine, install Docker:
apt install docker.io
Copy /etc/ceph/ceph.pub from the master node to this node in ~/.ssh/authorized_keys.
Then from master add the other machine to the cluster:
ceph orch host add host190.ocl.cl.cam.ac.uk 128.232.124.190
The disks should now appear as available.
# ceph orch device ls
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
host190.ocl.cl.cam.ac.uk /dev/vdb hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdc hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdd hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vde hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdf hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdg hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdh hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdi hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdj hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdk hdd 1000G Yes 21s ago
host190.ocl.cl.cam.ac.uk /dev/vdl hdd 1000G Yes 21s ago
host226.ocl.cl.cam.ac.uk /dev/sda hdd QEMU_HARDDISK_drive-ua-disk-volume-0 500G Yes 2m ago
host226.ocl.cl.cam.ac.uk /dev/sdb hdd QEMU_HARDDISK_drive-ua-disk-volume-1 500G Yes 2m ago
host226.ocl.cl.cam.ac.uk /dev/sdc hdd QEMU_HARDDISK_drive-ua-disk-volume-2 500G Yes 2m ago
Each Object Storage Daemon, OSD, backs a single disk. Add all the available devices
ceph orch apply osd --all-available-devices
Since these disks are virtual disks, we need to configure some to be SSD. Check the device numbers with ceph osd tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 12.20741 root default
-5 10.74252 host host190
1 hdd 0.97659 osd.1 up 1.00000 1.00000
3 hdd 0.97659 osd.3 up 1.00000 1.00000
5 hdd 0.97659 osd.5 up 1.00000 1.00000
6 hdd 0.97659 osd.6 up 1.00000 1.00000
7 hdd 0.97659 osd.7 up 1.00000 1.00000
8 hdd 0.97659 osd.8 up 1.00000 1.00000
9 hdd 0.97659 osd.9 up 1.00000 1.00000
10 hdd 0.97659 osd.10 up 1.00000 1.00000
11 hdd 0.97659 osd.11 up 1.00000 1.00000
12 hdd 0.97659 osd.12 up 1.00000 1.00000
13 hdd 0.97659 osd.13 up 1.00000 1.00000
-3 1.46489 host host226
0 hdd 0.48830 osd.0 up 1.00000 1.00000
2 hdd 0.48830 osd.2 up 1.00000 1.00000
4 hdd 0.48830 osd.4 up 1.00000 1.00000
Create CRUSH rules to separate fast vs slow disks. We want to target pools to specific devices.
ceph osd crush rm-device-class osd.0 osd.2 osd.4
ceph osd crush set-device-class ssd osd.0 osd.2 osd.4
Create a metadata pool (replicated, should be on fast disks)
ceph osd pool create cephfs_metadata 32 replicated
Fast data pool (replicated, for root filesystem)
ceph osd pool create cephfs_data_fast 64 replicated
Archive pool (erasure coded 8+3, for slow disks)
ceph osd erasure-code-profile set ec83profile k=8 m=3 crush-failure-domain=osd
ceph osd pool create cephfs_data_archive 128 erasure ec83profile
Set up pool properties. There are only two hosts in this test setup; ideally, the size would be three or more.
ceph osd pool set cephfs_metadata size 2
ceph osd pool set cephfs_data_fast size 2
Create CRUSH rules to allocate the data correctly and apply them to the pools.
ceph osd crush rule create-erasure cephfs_data_archive_osd ec83profile
ceph osd crush rule create-replicated fast_ssd_rule default osd ssd
ceph osd pool set cephfs_data_fast crush_rule fast_ssd_rule
ceph osd pool set cephfs_metadata crush_rule fast_ssd_rule
ceph osd pool set cephfs_data_archive crush_rule cephfs_data_archive
Allow CephFS to use the pools
ceph osd pool application enable cephfs_metadata cephfs
ceph osd pool application enable cephfs_data_fast cephfs
ceph osd pool application enable cephfs_data_archive cephfs
Create the filesystem
ceph fs new cephfs cephfs_metadata cephfs_data_fast
Add the EC pool as an additional data pool
ceph osd pool set cephfs_data_archive allow_ec_overwrites true
ceph fs add_data_pool cephfs cephfs_data_archive
Create an MDS for the file system
ceph fs set cephfs max_mds 1
ceph orch apply mds cephfs
CephFS warns about having the root of the file system on an erasure coding disk hence we use the fast disk as the root and map the other pool to a specific directory.
Get your admin key
ceph auth get-key client.admin
On a client machine, mount CephFS
mkdir -p /mnt/cephfs
mount -t ceph host226:6789,host190:6789:/ /mnt/cephfs -o name=admin,secret=YOUR_KEY_HERE
Create and configure the archive directory
mkdir /mnt/cephfs/archive
setfattr -n ceph.dir.layout.pool -v cephfs_data_archive /mnt/cephfs/archive
Verify it worked
getfattr -n ceph.dir.layout.pool /mnt/cephfs/archive
This ensures new files in /archive use the erasure-coded pool on the large disks, while the root uses the replicated fast pool.