User Tools

Site Tools


Sidebar

progetti:cloud-areapd:ceph:replace_a_osd

Replace a OSD

Replace a FileStore OSD

Let's suppose that osd.27 (device /dev/sdj) is broken.

Let's verify that this is a filestore OSD:

# ceph osd metadata 27 | grep osd_ob
    "osd_objectstore": "filestore",

Let's suppose that this OSD is using /dev/sdb3 for the journal:

[root@ceph-osd-03 ~]# ceph-disk list | grep osd.27
 /dev/sdj1 ceph data, active, cluster ceph, osd.27, journal /dev/sdb3

The following operations should be done to remove it from ceph:

ceph osd crush reweight osd.27 0

Wait that the status is HEALTH-OK. Then:

ceph osd out osd.27
ceph osd crush remove osd.27
systemctl stop ceph-osd@27.service
ceph auth del osd.27
ceph osd rm osd.27
umount /var/lib/ceph/osd/ceph-27

Replace the disk.

Run the prepare command (make sure to specify the right partition of the SSD disk):

ceph osd set noout
 
ceph-disk prepare --zap --cluster ceph --cluster-uuid 8162f291-00b6-4b40-a8b4-1981a8c09b64 --filestore --fs-type xfs /dev/sdj /dev/sdb3

Then enable the start of OSD at boot time it:

ceph-disk activate /dev/sdj1

If everything is ok, re-enable the data balance:

ceph osd unset noout

Replace a BlueStore OSD

Let's suppose that osd.14 (/dev/vdd) is broken.

Let's verify that this is a Bluestore OSD:

# ceph osd metadata 14 | grep osd_ob
    "osd_objectstore": "bluestore",

Let's find the relevant devices:

[root@c-osd-5 /]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-14/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-14//block": {
        "osd_uuid": "d14443ed-2f7d-4bbc-8cdf-f55c7e00a9b5",
        "size": 107369988096,
        "btime": "2019-01-30 16:33:54.429292",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "7a8cb8ff-562b-47da-a6aa-507136587dcf",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQDWw1Fc6g0zARAAy97VirlJ+wC7FmjlM0w3aQ==",
        "ready": "ready",
        "whoami": "14"
    },
    "/var/lib/ceph/osd/ceph-14//block.db": {
        "osd_uuid": "d14443ed-2f7d-4bbc-8cdf-f55c7e00a9b5",
        "size": 53687091200,
        "btime": "2019-01-30 16:33:54.432415",
        "description": "bluefs db"
    }
}

Let's find the volume groups used for the block and block.db:

[root@c-osd-5 /]# ls -l /var/lib/ceph/osd/ceph-14//block
lrwxrwxrwx 1 ceph ceph 27 May 13 15:34 /var/lib/ceph/osd/ceph-14//block -> /dev/ceph-block-14/block-14
[root@c-osd-5 /]# ls -l /var/lib/ceph/osd/ceph-14//block.db
lrwxrwxrwx 1 ceph ceph 24 May 13 15:34 /var/lib/ceph/osd/ceph-14//block.db -> /dev/ceph-db-12-15/db-14
[root@c-osd-5 /]# 

Let's verify that vdd is indeed the physical volume used for this OSD:

[root@c-osd-5 /]# vgdisplay -v ceph-block-14
  --- Volume group ---
  VG Name               ceph-block-14
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  15
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <100.00 GiB
  PE Size               4.00 MiB
  Total PE              25599
  Alloc PE / Size       25599 / <100.00 GiB
  Free  PE / Size       0 / 0   
  VG UUID               lcEfNK-P7gw-ddeH-ijGC-2d6z-WuUo-hqI1H2
 
  --- Logical volume ---
  LV Path                /dev/ceph-block-14/block-14
  LV Name                block-14
  VG Name                ceph-block-14
  LV UUID                hu4Xop-481K-BJyP-b473-PjEW-OQFT-oziYnc
  LV Write Access        read/write
  LV Creation host, time c-osd-5.novalocal, 2019-01-30 11:22:24 +0100
  LV Status              available
  # open                 4
  LV Size                <100.00 GiB
  Current LE             25599
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           252:14
 
  --- Physical volumes ---
  PV Name               /dev/vdd     
  PV UUID               2ab6Mn-8c5b-rN1H-zclU-uhnF-YJmF-e0ITMt
  PV Status             allocatable
  Total PE / Free PE    25599 / 0
 
[root@c-osd-5 /]# 

The following operations should be done to remove it from ceph:

ceph osd crush reweight osd.14 0

This will trigger a data movement from that OSD (ceph status will report many objects misplaced)

Wait until there are no more objects misplaced

Then:

ceph osd out osd.14

Verifichiamo che si possa "rimuovere" l'OSD

[root@ceph-mon-01 ~]# ceph osd safe-to-destroy 14
OSD(s) 14 are safe to destroy without reducing data durability.
[root@ceph-osd-02 ~]# systemctl kill ceph-osd@14
[root@ceph-osd-02 ~]# ceph osd destroy 14 --yes-i-really-mean-it
[root@ceph-osd-02 ~]# umount /var/lib/ceph/osd/ceph-14

Cancelliamo il volume group:

[root@c-osd-5 /]# vgremove ceph-block-14
Do you really want to remove volume group "ceph-block-14" containing 1 logical volumes? [y/n]: y
Do you really want to remove active logical volume ceph-block-14/block-14? [y/n]: y
  Logical volume "block-14" successfully removed
  Volume group "ceph-block-14" successfully removed
[root@c-osd-5 /]# 

Sostituiamo il disco. Supponiamo che quello nuovo si chiami sempre vdd.

Ricreo volume group e logical volume:

[root@c-osd-5 /]# vgcreate ceph-block-14 /dev/vdd
  Physical volume "/dev/vdd" successfully created.
  Volume group "ceph-block-14" successfully created
[root@c-osd-5 /]# lvcreate -l 100%FREE -n block-14 ceph-block-14
  Logical volume "block-14" created.
[root@c-osd-5 /]# 

Alla fine ricreiamo l'OSD:

ceph osd set norebalance
ceph osd set nobackfill
 
 
[root@c-osd-5 /]# ceph-volume lvm create --bluestore --data ceph-block-14/block-14 --block.db ceph-db-12-15/db-14 --osd-id 14

Dopo un po`, quando non ci sono piu` pg in peering:

ceph osd crush reweight osd.14 5.45609
 
ceph osd unset nobackfill
ceph osd unset norebalance
progetti/cloud-areapd/ceph/replace_a_osd.txt · Last modified: 2020/04/29 09:49 by sgaravat@infn.it