把ZFS根文件系统的Proxmox VE迁移到另一组硬盘上

任务简述

现在我们有一台Proxmox VE 6.0的服务器,原来是四块SSD组的ZFS mirror,因为各种原因需要把数据迁移到另两块SSD组的新ZFS mirror上,把原来的四块SSD释放出来挪作他用。因为跨国操作IPMI延迟很高,所以我们要尽量避免操作IPMI,尽量通过SSH完成整个迁移流程。

操作流程

准备操作系统

  1. 关掉所有无关的开机启动程序;
  2. 关掉所有的VM的开机启动;
  3. 如果有VM的硬盘在rpool下的ZFS zvol里面,把它迁移到Directory类型的存储下并勾选删除源;
  4. 做一次完整的系统备份

关于迁移硬盘存储位置:zvol需要逐个迁移,但是如果在Directory类型的存储下那就可以跟着系统一起复制走,能省很多条命令。有些类型的VM硬盘(比如EFI Disk)似乎是没有办法迁移的;这次要操作的服务器上正好遇到了这样的情况,所以下面的流程当中也会讲到怎么处理这样的硬盘。

识别硬盘ID

组ZFS最好用硬盘ID而非顺序分配的dev名称进行,以防翻车。为了下文识别方便,这里我们使用A、B、C、D来标识原来的四块SSD,用X和Z标识新的两块SSD。在开始之前,先记录一下硬盘id和dev名称的对应关系。

[email protected] ~ # ls -alh /dev/disk/by-id/ | grep -v wwn | grep -v eui
total 0
drwxr-xr-x 2 root root 680 Nov 16 15:06 .
drwxr-xr-x 8 root root 160 Nov 15 18:23 ..
lrwxrwxrwx 1 root root   9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA -> ../../sda
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA-part3 -> ../../sda3
lrwxrwxrwx 1 root root   9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB -> ../../sdd
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB-part9 -> ../../sdd9
lrwxrwxrwx 1 root root   9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC -> ../../sdc
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC-part9 -> ../../sdc9
lrwxrwxrwx 1 root root   9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD -> ../../sdb
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  13 Nov 15 18:23 nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX -> ../../nvme0n1
lrwxrwxrwx 1 root root  13 Nov 15 18:23 nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ -> ../../nvme1n1

在新硬盘上建立GPT分区表

首先我们记录一下原来分区的分区表:

Disk /dev/sda: 1953525168 sectors, 931.5 GiB
Model: Samsung SSD 860 
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 00000000-0000-0000-0000-00000000000
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 8-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34            2047   1007.0 KiB  EF02  
   2            2048         1050623   512.0 MiB   EF00  
   3         1050624      1953525134   931.0 GiB   BF01

在新的两块硬盘上抄一个结构差不多的上去:

[email protected] ~ # sgdisk --zap-all /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
[email protected] ~ # sgdisk --zap-all /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
[email protected] ~ # sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX
[email protected] ~ # sgdisk     -n2:1M:+512M   -t2:EF00 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX
[email protected] ~ # sgdisk     -n3:0:0      -t3:BF01 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX
[email protected] ~ # sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ
[email protected] ~ # sgdisk     -n2:1M:+512M   -t2:EF00 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ
[email protected] ~ # sgdisk     -n3:0:0      -t3:BF01 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ

建立新的ZFS存储池

同样首先记录一下原来的zpool的状态:

[email protected] ~ # zpool get all rpool                                   
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           1.81T                          -    
rpool  capacity                       2%                             -    
rpool  altroot                        -                              default
rpool  capacity                       2%                             -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           00000000000000000000           -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pve-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupditto                     0                              default
rpool  dedupratio                     1.00x                          -
rpool  free                           1.77T                          -
rpool  allocated                      41.9G                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  8%                             -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      00000000000000000000           -
rpool  autotrim                       off                            default
rpool  [email protected]_destroy          enabled                        local
rpool  [email protected]_bpobj            active                         local
rpool  [email protected]_compress           active                         local
rpool  [email protected]_vdev_crash_dump  enabled                        local
rpool  [email protected]_histogram     active                         local
rpool  [email protected]_txg            active                         local
rpool  [email protected]_birth             active                         local
rpool  [email protected]_dataset     active                         local
rpool  [email protected]_data          active                         local
rpool  [email protected]              enabled                        local
rpool  [email protected]_limits      enabled                        local
rpool  [email protected]_blocks           enabled                        local
rpool  [email protected]_dnode            enabled                        local
rpool  [email protected]                 enabled                        local
rpool  [email protected]                  enabled                        local
rpool  [email protected]                  enabled                        local
rpool  [email protected]_accounting     active                         local
rpool  [email protected]             disabled                       local
rpool  [email protected]_quota          disabled                       local
rpool  [email protected]_removal         disabled                       local
rpool  [email protected]_counts        disabled                       local
rpool  [email protected]_checkpoint       disabled                       local
rpool  [email protected]_v2            disabled                       local
rpool  [email protected]_classes     disabled                       local
rpool  [email protected]_defer         disabled                       local
rpool  [email protected]_v2            disabled                       local

然后在新硬盘上抄一个差不多的:

zpool create -o ashift=12 -d \
      -o [email protected]_destroy=enabled \
      -o [email protected]_bpobj=enabled \
      -o [email protected]_compress=enabled \
      -o [email protected]_vdev_crash_dump=enabled \
      -o [email protected]_histogram=enabled \
      -o [email protected]_txg=enabled \
      -o [email protected]_birth=enabled \
      -o [email protected]_dataset=enabled \
      -o [email protected]_data=enabled \
      -o [email protected]=enabled \
      -o [email protected]_limits=enabled \
      -o [email protected]_blocks=enabled \
      -o [email protected]_dnode=enabled \
      -o [email protected]=enabled \
      -o [email protected]=enabled \
      -o [email protected]=enabled \
      -o [email protected]_accounting=enabled \
      -O acltype=posixacl -O canmount=off -O compression=lz4 -O devices=off \
      -O normalization=formD -O relatime=on -O xattr=sa \
      -O mountpoint=/rpool -R /mnt \
      rpool2 mirror /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX-part3 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ-part3

看一下原来的池子里面有什么卷:

[email protected] ~ # zfs list
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool                     41.9G  1.71T      104K  /rpool
rpool/ROOT                41.8G  1.71T       96K  /rpool/ROOT
rpool/ROOT/pve-1          41.8G  1.71T     41.8G  /
rpool/data                 164K  1.71T       96K  /rpool/data
rpool/data/vm-104-disk-1    68K  1.71T       68K  -
rpool2                     900K   922G       96K  /mnt

vm开头那些是zvol,不需要管,把剩下那些volume也一样创建一份:

zfs create -o mountpoint=/rpool/ROOT   rpool2/ROOT
zfs create -o mountpoint=/newroot      rpool2/ROOT/pve-1
zfs create -o mountpoint=/rpool/data   rpool2/data

注意:

  • 我们这里暂时把新的根文件系统找了个地方挂载,因为直接挂载到/显然是会挂不上的;
  • 新创建的存储池名叫rpool2;但是为了迁移方便,它之后还是会被挂载到/rpool这个位置。

迁移数据

Volume全部创建好以后,把原来volume里面的数据原样复制到新的volume里面:

zfs mount -a
rsync -avxHAXW --progress /rpool/ROOT /mnt/rpool/ROOT
rsync -avxHAXW --progress /rpool/data /mnt/rpool/data
rsync -avxHAXW --progress / /mnt/newroot

至于那些vm开头的zvol,逐个创建相同大小的新zvol然后把数据全部dd进去:

[email protected] / # zfs get volsize rpool/data/vm-104-disk-1
NAME                      PROPERTY  VALUE    SOURCE
rpool/data/vm-104-disk-1  volsize   1M       local
[email protected] / # zfs create -s -V 1mb rpool2/data/vm-104-disk-1
[email protected] / # dd if=/dev/zvol/rpool/data/vm-104-disk-1 of=/dev/zvol/rpool2/data/vm-104-disk-1 bs=1M
0+1 records in
0+1 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0661738 s, 15.8 MB/s

修复各种程序对路径的引用

首先是Proxmox VE的存储池定义:

sed -ie "s/rpool/rpool2/g" /mnt/newroot/etc/pve/storage.cfg

然后是GRUB:

[email protected] ~ # sed -ie "s/rpool/rpool2/g" /mnt/newroot/etc/defaults/grub
[email protected] ~ # mount --bind /proc /mnt/newroot/proc
[email protected] ~ # mount --bind /dev /mnt/newroot/dev
[email protected] ~ # mount --bind /sys /mnt/newroot/sys
[email protected] ~ # chroot /mnt/newroot/ /bin/bash
[email protected] ~ # grub-install --recheck --no-floppy /dev/nvme0n1
[email protected] ~ # grub-install --recheck --no-floppy /dev/nvme1n1
[email protected] ~ # update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.0.21-2-pve
Found initrd image: /boot/initrd.img-5.0.21-2-pve
Found linux image: /boot/vmlinuz-5.0.21-1-pve
Found initrd image: /boot/initrd.img-5.0.21-1-pve
Found linux image: /boot/vmlinuz-4.15.18-20-pve
Found initrd image: /boot/initrd.img-4.15.18-20-pve
Found linux image: /boot/vmlinuz-4.15.18-12-pve
Found initrd image: /boot/initrd.img-4.15.18-12-pve
Found memtest86+ image: /ROOT/[email protected]/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/[email protected]/boot/memtest86+_multiboot.bin
done
[email protected] ~ # exit
[email protected] ~ # umount /mnt/newroot/proc
[email protected] ~ # umount /mnt/newroot/dev
[email protected] ~ # umount /mnt/newroot/sys

操作GRUB时我们还是使用传统的dev设备名。

从新硬盘引导启动

重启,在固件中设置用新硬盘(任意一块即可)引导启动。

清空旧硬盘

[email protected] ~ # zfs set canmount=off rpool/data
[email protected] ~ # zfs set canmount=off rpool/ROOT/pve-1
[email protected] ~ # zfs set canmount=off rpool/ROOT
[email protected] ~ # zfs set canmount=off rpool
[email protected] ~ # zpool export rpool # 可能会失败
[email protected] ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA
[email protected] ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB
[email protected] ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC
[email protected] ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD

注意:

  • 卸载volume的时候需要按列出顺序的相反顺序进行;
  • 如果第一步失败了,那么完成以后再次重启一下,这样系统就不会认到旧的硬盘了。

恢复环境

  1. 把VM的硬盘重新迁移回zvol;
  2. 启动所有VM;
  3. 恢复所有VM的开机自启动设置;
  4. 恢复程序的开机自启动设置。

故障救援

Proxmox VE安装盘可以用来临时引导系统进行特定的恢复工作,不过光盘内建的GRUB不支持lz4压缩的ZFS根文件系统,所以GRUB里面直接选择Rescue Boot菜单项是无法引导成功的。这时候可以先选择进入安装程序,等待同意用户协议的画面出现,按Ctrl+Alt+F1,再按Ctrl+C退出安装程序,从这里挂载根文件系统进行修复工作:

zpool import # 扫描所有存储池
zpool import -f rpool2
zfs set mountpoint=/mnt rpool2/ROOT/pve-1
zfs mount -a
mount --bind /proc /mnt/proc
mount --bind /dev /mnt/dev
mount --bind /sys /mnt/sys
chroot /mnt/ /bin/bash

在完成修复工作后,需要正确卸载存储池,以防进系统以后出现问题:

umount /mnt/proc
umount /mnt/dev
umount /mnt/sys
zfs set mountpoint=/ rpool2/ROOT/pve-1
zpool export rpool2
sync; sync

GRUB无法加载Stage 2

如果GRUB在安装的时候读取到了错误的硬盘信息,可能会发生这样的问题:抹掉原来的硬盘并重启以后,GRUB无法加载stage 2,在屏幕上输出:

error: no such device: 0000000000000000.
error: unknown filesystem.
Entering rescue mode...
grub rescue>

这可以通过重新安装GRUB并生成配置文件来解决。按上面的说法进入恢复环境,然后运行:

grub-install --recheck --no-floppy /dev/nvme0n1
grub-install --recheck --no-floppy /dev/nvme1n1
update-grub

退出恢复环境并重启即可。


参考:

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据