Technical Documentation

Replacing an OSD in Nautilus

By 16/07/2019 No Comments

Now that you’ve upgraded Ceph from Luminous to Nautilus, what happens if a disk fails or the administrator needs to convert from filestore to bluestore? The OSD needs to be replaced.

The OSD to be replaced was created by ceph-disk in Luminous. But in Nautilus, things have changed. The ceph-disk command has been removed and replaced by ceph-volume. By default, ceph-volume deploys OSD on logical volumes. We’ll largely follow the official instructions here. In this example, we are going to replace OSD 20.

On MON, check if OSD is safe to destroy:


[root@mon-1 ~]# ceph osd safe-to-destroy osd.20
OSD(s) 20 are safe to destroy without reducing data durability.

If yes on MON, destroy it:


[root@mon-1 ~]# ceph osd destroy 20 --yes-i-really-mean-it
destroyed osd.20

The OSD will be shown as destroyed:


[root@mon-1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 66.17834 root default
-7 22.05945 host compute-1
......
19 hdd 1.83829 osd.19 up 1.00000 1.00000
20 hdd 1.83829 osd.20 destroyed 0 1.00000
22 hdd 1.83829 osd.22 up 1.00000 1.00000

On OSD after replacing the faulty disk, use perccli to create a new VD with the same sdX device name. Then zap it.


[root@compute-3 ~]# ceph-volume lvm zap /dev/sdl
--> Zapping: /dev/sdl
--> --destroy was not specified, but zapping a whole device will remove the partition table
Running command: /usr/sbin/wipefs --all /dev/sdl
Running command: /bin/dd if=/dev/zero of=/dev/sdl bs=1M count=10
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB) copied
stderr: , 0.0634846 s, 165 MB/s
--> Zapping successful for:

Find out the existing db/wal partitions used by the old OSD. Since the ceph-disk command is not available any more, I have written a script to show current mappings of OSD data and db/wal partitions.

Firstly, run “ceph-volume simple scan” to generate OSD json files in /etc/ceph/osd/. Then run this script.


[root@compute-3 ~]# cat ceph-disk-list.sh
#!/bin/bash
JSON_PATH="/etc/ceph/osd/"
for i in `ls $JSON_PATH`; do
OSD_ID=`cat $JSON_PATH$i | jq '.whoami'`
DATA_PATH=`cat $JSON_PATH$i | jq -r '.data.path'`
DB_PATH=`cat $JSON_PATH$i | jq -r '."block.db".path'`
WAL_PATH=`cat $JSON_PATH$i | jq -r '."block.wal".path'`
echo "OSD.$OSD_ID: $DATA_PATH"
#echo $DB_PATH
DB_REAL=`readlink -f $DB_PATH`
WAL_REAL=`readlink -f $WAL_PATH`
echo " db: $DB_REAL"
echo " wal: $WAL_REAL"
echo "============================="
done

It will show the mapping of existing ceph OSD (created by ceph-disk) and db/wal.


[root@compute-3 ~]# ./ceph-disk-list.sh
OSD.1: /dev/sdb1
db: /dev/nvme0n1p27
wal: /dev/nvme0n1p28
=============================
OSD.11: /dev/sdg1
db: /dev/nvme0n1p37
wal: /dev/nvme0n1p38
=============================
OSD.13: /dev/sdh1
db: /dev/nvme0n1p35
wal: /dev/nvme0n1p36
=============================
OSD.14: /dev/sdi1
db: /dev/nvme0n1p33
wal: /dev/nvme0n1p34
=============================
OSD.18: /dev/sdj1
db: /dev/nvme0n1p51
wal: /dev/nvme0n1p52
=============================
OSD.22: /dev/sdm1
db: /dev/nvme0n1p29
wal: /dev/nvme0n1p30
=============================
OSD.3: /dev/sdc1
db: /dev/nvme0n1p45
wal: /dev/nvme0n1p46
=============================
OSD.5: /dev/sdd1
db: /dev/nvme0n1p43
wal: /dev/nvme0n1p44
=============================
OSD.7: /dev/sde1
db: /dev/nvme0n1p41
wal: /dev/nvme0n1p42
=============================
OSD.9: /dev/sdf1
db: /dev/nvme0n1p39
wal: /dev/nvme0n1p40
=============================

Compare this list with the output of lsblk to find out free db/wal devices. Then create a new OSD with them:


[root@compute-3 ~]# ceph-volume lvm create --osd-id 20 --data /dev/sdl --bluestore --block.db /dev/nvme0n1p49 --block.wal /dev/nvme0n1p50
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e795fd7b-df8d-48d7-99d5-625f41869e7a 20
Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e /dev/sdl
stdout: Physical volume "/dev/sdl" successfully created.
stdout: Volume group "ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e" successfully created
Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-e795fd7b-df8d-48d7-99d5-625f41869e7a ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e
stdout: Logical volume "osd-block-e795fd7b-df8d-48d7-99d5-625f41869e7a" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-20
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-20
Running command: /bin/chown -h ceph:ceph /dev/ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e/osd-block-e795fd7b-df8d-48d7-99d5-625f41869e7a
Running command: /bin/chown -R ceph:ceph /dev/dm-2
Running command: /bin/ln -s /dev/ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e/osd-block-e795fd7b-df8d-48d7-99d5-625f41869e7a /var/lib/ceph/osd/ceph-20/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-20/activate.monmap
stderr: got monmap epoch 9
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-20/keyring --create-keyring --name osd.20 --add-key AQD38iNdxf89GRAAO6HbRFcgCj6HSuyOsJRGeA==
stdout: creating /var/lib/ceph/osd/ceph-20/keyring
added entity osd.20 auth(key=AQD38iNdxf89GRAAO6HbRFcgCj6HSuyOsJRGeA==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-20/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-20/
Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p50
Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p49
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 20 --monmap /var/lib/ceph/osd/ceph-20/activate.monmap --keyfile - --bluestore-block-wal-path /dev/nvme0n1p50 --bluestore-block-db-path /dev/nvme0n1p49 --osd-data /var/lib/ceph/osd/ceph-20/ --osd-uuid e795fd7b-df8d-48d7-99d5-625f41869e7a --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sdl
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-20
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e/osd-block-e795fd7b-df8d-48d7-99d5-625f41869e7a --path /var/lib/ceph/osd/ceph-20 --no-mon-config
Running command: /bin/ln -snf /dev/ceph-e65e64a4-eeec-434f-a93c-82d3e2cfa51e/osd-block-e795fd7b-df8d-48d7-99d5-625f41869e7a /var/lib/ceph/osd/ceph-20/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-20/block
Running command: /bin/chown -R ceph:ceph /dev/dm-2
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-20
Running command: /bin/ln -snf /dev/nvme0n1p49 /var/lib/ceph/osd/ceph-20/block.db
Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p49
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-20/block.db
Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p49
Running command: /bin/ln -snf /dev/nvme0n1p50 /var/lib/ceph/osd/ceph-20/block.wal
Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p50
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-20/block.wal
Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p50
Running command: /bin/systemctl enable ceph-volume@lvm-20-e795fd7b-df8d-48d7-99d5-625f41869e7a
stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-20-e795fd7b-df8d-48d7-99d5-625f41869e7a.service to /usr/lib/systemd/system/ceph-volume@.service.
Running command: /bin/systemctl enable --runtime ceph-osd@20
Running command: /bin/systemctl start ceph-osd@20
--> ceph-volume lvm activate successful for osd ID: 20
--> ceph-volume lvm create successful for: /dev/sdl

The new OSD will be started automatically, and backfill will start.

For further information, check out the official Ceph documentation to replace an OSD. If you’d like to learn more, we have Ceph training available, or ask our Solutionauts for some help.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.