任务简述
现在我们有一台Proxmox VE 6.0的服务器,原来是四块SSD组的ZFS mirror,因为各种原因需要把数据迁移到另两块SSD组的新ZFS mirror上,把原来的四块SSD释放出来挪作他用。因为跨国操作IPMI延迟很高,所以我们要尽量避免操作IPMI,尽量通过SSH完成整个迁移流程。
操作流程
准备操作系统
- 关掉所有无关的开机启动程序;
- 关掉所有的VM的开机启动;
- 如果有VM的硬盘在rpool下的ZFS zvol里面,把它迁移到Directory类型的存储下并勾选删除源;
- 做一次完整的系统备份。
关于迁移硬盘存储位置:zvol需要逐个迁移,但是如果在Directory类型的存储下那就可以跟着系统一起复制走,能省很多条命令。有些类型的VM硬盘(比如EFI Disk)似乎是没有办法迁移的;这次要操作的服务器上正好遇到了这样的情况,所以下面的流程当中也会讲到怎么处理这样的硬盘。
识别硬盘ID
组ZFS最好用硬盘ID而非顺序分配的dev名称进行,以防翻车。为了下文识别方便,这里我们使用A、B、C、D来标识原来的四块SSD,用X和Z标识新的两块SSD。在开始之前,先记录一下硬盘id和dev名称的对应关系。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
root@pve-testnode ~ # ls -alh /dev/disk/by-id/ | grep -v wwn | grep -v eui total 0 drwxr-xr-x 2 root root 680 Nov 16 15:06 . drwxr-xr-x 8 root root 160 Nov 15 18:23 .. lrwxrwxrwx 1 root root 9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA -> ../../sda lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB -> ../../sdd lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB-part9 -> ../../sdd9 lrwxrwxrwx 1 root root 9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC -> ../../sdc lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC-part9 -> ../../sdc9 lrwxrwxrwx 1 root root 9 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD -> ../../sdb lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 10 Nov 15 18:23 ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD-part3 -> ../../sdb3 lrwxrwxrwx 1 root root 13 Nov 15 18:23 nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX -> ../../nvme0n1 lrwxrwxrwx 1 root root 13 Nov 15 18:23 nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ -> ../../nvme1n1 |
在新硬盘上建立GPT分区表
首先我们记录一下原来分区的分区表:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Disk /dev/sda: 1953525168 sectors, 931.5 GiB Model: Samsung SSD 860 Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 00000000-0000-0000-0000-00000000000 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 1953525134 Partitions will be aligned on 8-sector boundaries Total free space is 0 sectors (0 bytes) Number Start (sector) End (sector) Size Code Name 1 34 2047 1007.0 KiB EF02 2 2048 1050623 512.0 MiB EF00 3 1050624 1953525134 931.0 GiB BF01 |
在新的两块硬盘上抄一个结构差不多的上去:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
root@pve-testnode ~ # sgdisk --zap-all /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX Creating new GPT entries. GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. root@pve-testnode ~ # sgdisk --zap-all /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ Creating new GPT entries. GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. root@pve-testnode ~ # sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX root@pve-testnode ~ # sgdisk -n2:1M:+512M -t2:EF00 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX root@pve-testnode ~ # sgdisk -n3:0:0 -t3:BF01 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX root@pve-testnode ~ # sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ root@pve-testnode ~ # sgdisk -n2:1M:+512M -t2:EF00 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ root@pve-testnode ~ # sgdisk -n3:0:0 -t3:BF01 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ |
建立新的ZFS存储池
同样首先记录一下原来的zpool的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
root@pve-testnode ~ # zpool get all rpool NAME PROPERTY VALUE SOURCE rpool size 1.81T - rpool capacity 2% - rpool altroot - default rpool capacity 2% - rpool altroot - default rpool health ONLINE - rpool guid 00000000000000000000 - rpool version - default rpool bootfs rpool/ROOT/pve-1 local rpool delegation on default rpool autoreplace off default rpool cachefile - default rpool failmode wait default rpool listsnapshots off default rpool autoexpand off default rpool dedupditto 0 default rpool dedupratio 1.00x - rpool free 1.77T - rpool allocated 41.9G - rpool readonly off - rpool ashift 12 local rpool comment - default rpool expandsize - - rpool freeing 0 - rpool fragmentation 8% - rpool leaked 0 - rpool multihost off default rpool checkpoint - - rpool load_guid 00000000000000000000 - rpool autotrim off default rpool feature@async_destroy enabled local rpool feature@empty_bpobj active local rpool feature@lz4_compress active local rpool feature@multi_vdev_crash_dump enabled local rpool feature@spacemap_histogram active local rpool feature@enabled_txg active local rpool feature@hole_birth active local rpool feature@extensible_dataset active local rpool feature@embedded_data active local rpool feature@bookmarks enabled local rpool feature@filesystem_limits enabled local rpool feature@large_blocks enabled local rpool feature@large_dnode enabled local rpool feature@sha512 enabled local rpool feature@skein enabled local rpool feature@edonr enabled local rpool feature@userobj_accounting active local rpool feature@encryption disabled local rpool feature@project_quota disabled local rpool feature@device_removal disabled local rpool feature@obsolete_counts disabled local rpool feature@zpool_checkpoint disabled local rpool feature@spacemap_v2 disabled local rpool feature@allocation_classes disabled local rpool feature@resilver_defer disabled local rpool feature@bookmark_v2 disabled local |
然后在新硬盘上抄一个差不多的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
zpool create -o ashift=12 -d \ -o feature@async_destroy=enabled \ -o feature@empty_bpobj=enabled \ -o feature@lz4_compress=enabled \ -o feature@multi_vdev_crash_dump=enabled \ -o feature@spacemap_histogram=enabled \ -o feature@enabled_txg=enabled \ -o feature@hole_birth=enabled \ -o feature@extensible_dataset=enabled \ -o feature@embedded_data=enabled \ -o feature@bookmarks=enabled \ -o feature@filesystem_limits=enabled \ -o feature@large_blocks=enabled \ -o feature@large_dnode=enabled \ -o feature@sha512=enabled \ -o feature@skein=enabled \ -o feature@edonr=enabled \ -o feature@userobj_accounting=enabled \ -O acltype=posixacl -O canmount=off -O compression=lz4 -O devices=off \ -O normalization=formD -O relatime=on -O xattr=sa \ -O mountpoint=/rpool -R /mnt \ rpool2 mirror /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_XXXXXXXXXXXXXXX-part3 /dev/disk/by-id/nvme-Samsung_SSD_970_PRO_1TB_ZZZZZZZZZZZZZZZ-part3 |
看一下原来的池子里面有什么卷:
1 2 3 4 5 6 7 8 |
root@pve-testnode ~ # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 41.9G 1.71T 104K /rpool rpool/ROOT 41.8G 1.71T 96K /rpool/ROOT rpool/ROOT/pve-1 41.8G 1.71T 41.8G / rpool/data 164K 1.71T 96K /rpool/data rpool/data/vm-104-disk-1 68K 1.71T 68K - rpool2 900K 922G 96K /mnt |
vm开头那些是zvol,不需要管,把剩下那些volume也一样创建一份:
1 2 3 4 5 6 |
zfs create -o mountpoint=/rpool/ROOT rpool2/ROOT zfs create -o mountpoint=/newroot rpool2/ROOT/pve-1 # keep $pool/data mounted at /$pool/data, otherwise container creation will fail # https://bugzilla.proxmox.com/show_bug.cgi?id=2085 zfs create -o mountpoint=/rpool2/data rpool2/data |
注意:
- 我们这里暂时把新的根文件系统找了个地方挂载,因为直接挂载到
/
显然是会挂不上的; - 新创建的存储池名叫
rpool2
;但是为了迁移方便,它之后还是会被挂载到/rpool
这个位置。
迁移数据
Volume全部创建好以后,把原来volume里面的数据原样复制到新的volume里面:
1 2 3 4 |
zfs mount -a rsync -avxHAXW --progress /rpool/ROOT /mnt/rpool/ROOT rsync -avxHAXW --progress /rpool/data /mnt/rpool/data rsync -avxHAXW --progress / /mnt/newroot |
至于那些vm开头的zvol,逐个创建相同大小的新zvol然后把数据全部dd进去:
1 2 3 4 5 6 7 8 |
root@pve-testnode / # zfs get volsize rpool/data/vm-104-disk-1 NAME PROPERTY VALUE SOURCE rpool/data/vm-104-disk-1 volsize 1M local root@pve-testnode / # zfs create -s -V 1mb rpool2/data/vm-104-disk-1 root@pve-testnode / # dd if=/dev/zvol/rpool/data/vm-104-disk-1 of=/dev/zvol/rpool2/data/vm-104-disk-1 bs=1M 0+1 records in 0+1 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0661738 s, 15.8 MB/s |
修复各种程序对路径的引用
首先是Proxmox VE的存储池定义:
1 |
sed -ie "s/rpool/rpool2/g" /mnt/newroot/etc/pve/storage.cfg |
然后是GRUB:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
root@pve-testnode ~ # sed -ie "s/rpool/rpool2/g" /mnt/newroot/etc/defaults/grub root@pve-testnode ~ # mount --bind /proc /mnt/newroot/proc root@pve-testnode ~ # mount --bind /dev /mnt/newroot/dev root@pve-testnode ~ # mount --bind /sys /mnt/newroot/sys root@pve-testnode ~ # chroot /mnt/newroot/ /bin/bash root@pve-testnode ~ # grub-install --recheck --no-floppy /dev/nvme0n1 root@pve-testnode ~ # grub-install --recheck --no-floppy /dev/nvme1n1 root@pve-testnode ~ # update-grub Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.0.21-2-pve Found initrd image: /boot/initrd.img-5.0.21-2-pve Found linux image: /boot/vmlinuz-5.0.21-1-pve Found initrd image: /boot/initrd.img-5.0.21-1-pve Found linux image: /boot/vmlinuz-4.15.18-20-pve Found initrd image: /boot/initrd.img-4.15.18-20-pve Found linux image: /boot/vmlinuz-4.15.18-12-pve Found initrd image: /boot/initrd.img-4.15.18-12-pve Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin done root@pve-testnode ~ # exit root@pve-testnode ~ # umount /mnt/newroot/proc root@pve-testnode ~ # umount /mnt/newroot/dev root@pve-testnode ~ # umount /mnt/newroot/sys |
操作GRUB时我们还是使用传统的dev设备名。
从新硬盘引导启动
重启,在固件中设置用新硬盘(任意一块即可)引导启动。
清空旧硬盘
1 2 3 4 5 6 7 8 9 |
root@pve-testnode ~ # zfs set canmount=off rpool/data root@pve-testnode ~ # zfs set canmount=off rpool/ROOT/pve-1 root@pve-testnode ~ # zfs set canmount=off rpool/ROOT root@pve-testnode ~ # zfs set canmount=off rpool root@pve-testnode ~ # zpool export rpool # 可能会失败 root@pve-testnode ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_AAAAAAAAAAAAAAA root@pve-testnode ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_BBBBBBBBBBBBBBB root@pve-testnode ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_CCCCCCCCCCCCCCC root@pve-testnode ~ # sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_860_QVO_1TB_DDDDDDDDDDDDDDD |
注意:
- 卸载volume的时候需要按列出顺序的相反顺序进行;
- 如果第一步失败了,那么完成以后再次重启一下,这样系统就不会认到旧的硬盘了。
恢复环境
- 把VM的硬盘重新迁移回zvol;
- 启动所有VM;
- 恢复所有VM的开机自启动设置;
- 恢复程序的开机自启动设置。
故障救援
Proxmox VE安装盘可以用来临时引导系统进行特定的恢复工作,不过光盘内建的GRUB不支持lz4压缩的ZFS根文件系统,所以GRUB里面直接选择Rescue Boot菜单项是无法引导成功的。这时候可以先选择进入安装程序,等待同意用户协议的画面出现,按Ctrl+Alt+F1,再按Ctrl+C退出安装程序,从这里挂载根文件系统进行修复工作:
1 2 3 4 5 6 7 8 |
zpool import # 扫描所有存储池 zpool import -f rpool2 zfs set mountpoint=/mnt rpool2/ROOT/pve-1 zfs mount -a mount --bind /proc /mnt/proc mount --bind /dev /mnt/dev mount --bind /sys /mnt/sys chroot /mnt/ /bin/bash |
在完成修复工作后,需要正确卸载存储池,以防进系统以后出现问题:
1 2 3 4 5 6 |
umount /mnt/proc umount /mnt/dev umount /mnt/sys zfs set mountpoint=/ rpool2/ROOT/pve-1 zpool export rpool2 sync; sync |
GRUB无法加载Stage 2
如果GRUB在安装的时候读取到了错误的硬盘信息,可能会发生这样的问题:抹掉原来的硬盘并重启以后,GRUB无法加载stage 2,在屏幕上输出:
1 2 3 4 |
error: no such device: 0000000000000000. error: unknown filesystem. Entering rescue mode... grub rescue> |
这可以通过重新安装GRUB并生成配置文件来解决。按上面的说法进入恢复环境,然后运行:
1 2 3 |
grub-install --recheck --no-floppy /dev/nvme0n1 grub-install --recheck --no-floppy /dev/nvme1n1 update-grub |
退出恢复环境并重启即可。
参考: