3 posts tagged with "storage"

External CSI Storage Backup and Restore With Velero

May 26, 2025 · 8 min read

Principal Software Engineer

Harvester 1.5 introduces support for the provisioning of virtual machine root volumes and data volumes using external Container Storage Interface (CSI) drivers.

This article demonstrates how to use Velero 1.16.0 to perform backup and restore of virtual machines in Harvester.

It goes through commands and manifests to:

Back up virtual machines in a namespace, their NFS CSI volumes, and associated namespace-scoped configuration
Export the backup artifacts to an AWS S3 bucket
Restore to a different namespace on the same cluster
Restore to a different cluster

Velero is a Kubernetes-native backup and restore tool that enables users to perform scheduled and on-demand backups of virtual machines to external object storage providers such as S3, Azure Blob, or GCS, aligning with enterprise backup and disaster recovery practices.

note

The commands and manifests used in this article are tested with Harvester 1.5.1.

The CSI NFS driver and Velero configuration and versions used are for demonstration purposes only. Adjust them according to your environment and requirements.

important

The examples provided are intended to backup and restore Linux virtual machine workloads. It is not suitable for backing up guest clusters provisioned via the Harvester Rancher integration.

To backup and restore guest clusters like RKE2, please refer to the distro official documentation.

Harvester Installation

Refer to the Harvester documentation for installation requirements and options.

The kubeconfig file of the Harvester cluster can be retrieved following the instructions here.

Install and Configure Velero

Download the Velero CLI.

Set the following shell variables:

BUCKET_NAME=<your-s3-bucket-name>
BUCKET_REGION=<your-s3-bucket-region>
AWS_CREDENTIALS_FILE=<absolute-path-to-your-aws-credentials-file>

Install Velero on the Harvester cluster:

velero install \
  --provider aws \
  --features=EnableCSI \
  --plugins "velero/velero-plugin-for-aws:v1.12.0,quay.io/kubevirt/kubevirt-velero-plugin:v0.7.1" \
  --bucket "${BUCKET_NAME}" \
  --secret-file "${AWS_CREDENTIALS_FILE}" \
  --backup-location-config region="${BUCKET_REGION}" \
  --snapshot-location-config region="${BUCKET_REGION}" \
  --use-node-agent

In this setup, Velero is configured to:
- Run in the velero namespace
- Enable CSI volume snapshot APIs
- Enable the built-in node agent data movement controllers and pods
- Use the velero-plugin-for-aws plugin to manage interactions with the S3 object store
- Use the kubevirt-velero-plugin plugin to backup and restore KubeVirt resources

Confirm that Velero is installed and running:

kubectl -n velero get po

NAME                      READY   STATUS    RESTARTS         AGE
node-agent-875mr          1/1     Running   0                1d
velero-745645565f-5dqgr   1/1     Running   0                1d

Configure the velero CLI to output the backup and restore status of CSI objects:

velero client config set features=EnableCSI

Deploy the NFS CSI and Example Server

Follow the instructions in the NFS CSI documentation to set up the NFS CSI driver, its storage class, and an example NFS server.

The NFS CSI volume snapshotting capability must also be enabled following the instructions here.

Confirm that the NFS CSI and example server are running:

kubectl get po -A -l 'app in (csi-nfs-node,csi-nfs-controller,nfs-server)'

NAMESPACE     NAME                                  READY   STATUS    RESTARTS    AGE
default       nfs-server-b767db8c8-9ltt4            1/1     Running   0           1d
kube-system   csi-nfs-controller-5bf646f7cc-6vfxn   5/5     Running   0           1d
kube-system   csi-nfs-node-9z6pt                    3/3     Running   0           1d

The default NFS CSI storage class is named nfs-csi:

kubectl get sc nfs-csi

NAME      PROVISIONER      RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-csi   nfs.csi.k8s.io   Delete          Immediate           true                   14d

Confirm that the default NFS CSI volume snapshot class csi-nfs-snapclass is also installed:

kubectl get volumesnapshotclass csi-nfs-snapclass

NAME                DRIVER           DELETIONPOLICY   AGE
csi-nfs-snapclass   nfs.csi.k8s.io   Delete           14d

Preparing the Virtual Machine and Image

Create a custom namespace named demo-src:

kubectl create ns demo-src

Follow the instructions in the Image Management documentation to upload the Ubuntu 24.04 raw image from https://cloud-images.ubuntu.com/minimal/releases/noble/ to Harvester.

The storage class of the image must be set to nfs-csi, per the Third-Party Storage Support documentation.

Confirm the virtual machine image is successfully uploaded to Harvester:

Follow the instructions in the third-party storage documentation to create a virtual machine with NFS root and data volumes, using the image uploaded in the previous step.

For NFS CSI snapshot to work, the NFS data volume must have the volumeMode set to Filesystem:

optional

For testing purposes, once the virtual machine is ready, access it via SSH and add some files to both the root and data volumes.

The data volume needs to be partitioned, with a file system created and mounted before files can be written to it.

Backup the Source Namespace

Use the velero CLI to create a backup of the demo-src namespace using Velero's built-in data mover:

BACKUP_NAME=backup-demo-src-`date "+%s"`

velero backup create "${BACKUP_NAME}" \
  --include-namespaces demo-src \
  --snapshot-move-data

info

For more information on Velero's data mover, see its documentation on CSI data snapshot movement capability.

This creates a backup of the demo-src namespace containing resources like the virtual machine created earlier, its volumes, secrets and other associated configuration.

Depending on the size of the virtual machine and its volumes, the backup may take a while to complete.

The DataUpload custom resources provide insights into the backup progress:

kubectl -n velero get datauploads -l velero.io/backup-name="${BACKUP_NAME}"

Confirm that the backup completed successfully:

velero backup get "${BACKUP_NAME}"

NAME                         STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
backup-demo-src-1747954979   Completed   0        0          2025-05-22 16:04:46 -0700 PDT   29d       default            <none>

After the backup completes, Velero removes the CSI snapshots from the storage side to free up the snapshot data space.

tips

The velero backup describe and velero backup logs commands can be used to assess details of the backup including resources included, skipped, and any warnings or errors encountered during the backup process.

Restore To A Different Namespace

This section describes how to restore the backup from the demo-src namespace to a new namespace named demo-dst.

Save the following restore modifier to a local file named modifier-data-volumes.yaml:

cat <<EOF > modifier-data-volumes.yaml
version: v1
resourceModifierRules:
- conditions:
    groupResource: persistentvolumeclaims
    matches:
    - path: /metadata/annotations/harvesterhci.io~1volumeForVirtualMachine
      value: "\"true\""
  patches:
  - operation: remove
    path: /metadata/annotations/harvesterhci.io~1volumeForVirtualMachine
EOF

This restore modifier removes the harvesterhci.io/volumeForVirtualMachine annotation from the virtual machine data volumes to ensure that the restoration do not conflict with the CDI volume import populator.

Create the restore modifier:

kubectl -n velero create cm modifier-data-volumes --from-file=modifier-data-volumes.yaml

Assign the backup name to a shell variable:

BACKUP_NAME=backup-demo-src-1747954979

Start the restore operation:

velero restore create \
  --from-backup "${BACKUP_NAME}" \
  --namespace-mappings "demo-src:demo-dst" \
  --exclude-resources "virtualmachineimages.harvesterhci.io" \
  --resource-modifier-configmap "modifier-data-volumes" \
  --labels "velero.kubevirt.io/clear-mac-address=true,velero.kubevirt.io/generate-new-firmware-uuid=true"

During the restore:
- The virtual machine MAC address and firmware UUID are reset to avoid potential conflicts with existing virtual machines.
- the virtual machine image manifest is excluded because Velero restores the entire state of the virtual machine from the backup.
- the modifier-data-volumes restore modifier is invoked to modify the virtual machine data volumes metadata to prevent conflicts with the CDI volume import populator.

While the restore operation is still in-progress, the DataDownload custom resources can be used to examine the progress of the operation:

RESTORE_NAME=backup-demo-src-1747954979-20250522164015

kubectl -n velero get datadownload -l velero.io/restore-name="${RESTORE_NAME}"

Confirm that the restore completed successfully:

velero restore get

NAME                                        BACKUP                       STATUS      STARTED                         COMPLETED                       ERRORS   WARNINGS   CREATED                         SELECTOR
backup-demo-src-1747954979-20250522164015   backup-demo-src-1747954979   Completed   2025-05-22 16:40:15 -0700 PDT   2025-05-22 16:40:49 -0700 PDT   0        6          2025-05-22 16:40:15 -0700 PDT   <none>

Verify that the virtual machine and its configuration are restored to the new demo-dst namespace:

note

Velero uses Kopia as its default data mover. This issue describes some of its limitations on advanced file system features such as setuid/gid, hard links, mount points, sockets, xattr, ACLs, etc.

Velero provides the --data-mover option to configure custom data movers to satisfy different use cases. For more information, see the Velero's documentation.

tips

The velero restore describe and velero restore logs commands provide more insights into the restore operation including the resources restored, skipped, and any warnings or errors encountered during the restore process.

Restore To A Different Cluster

This section extends the above scenario to demonstrate the steps to restore the backup to a different Harvester cluster.

On the target cluster, install Velero, and set up the NFS CSI and NFS server following the instructions from the Deploy the NFS CSI and Example Server section.

Once Velero is configured to use the same backup location as the source cluster, it automatically discovers the available backups:

velero backup get

NAME                         STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
backup-demo-src-1747954979   Completed   0        0          2025-05-22 16:04:46 -0700 PDT   29d       default            <none>

Follow the steps in the Restore To A Different Namespace section to restore the backup on the target cluster.

Remove the --namespace-mappings option to set the restored namespace to demo-src on the target cluster.

Confirm that the virtual machine and its configuration are restored to the demo-src namespace:

Select Longhorn Volume Snapshot Class

To perform Velero backup and restore of virtual machines with Longhorn volumes, label the Longhorn volume snapshot class longhorn as follows:

kubectl label volumesnapshotclass longhorn velero.io/csi-volumesnapshot-class

This helps Velero to find the correct Longhorn snapshot class to use during backup and restore.

Limitations

Enhancements related to the limitations described in this section are tracked at https://github.com/harvester/harvester/issues/8367.

By default, Velero only supports resource filtering by resource groups and labels. In order to backup/restore a single instance of virtual machine, custom labels must be applied to the virtual machine, and its virtual machine instance, pod, data volumes, persistent volumes claim, persistent volumes and cloudinit secret resources. It's recommended to backup the entire namespace and perform resource filtering during restore to ensure that backup contains all the dependency resources required by the virtual machine.
The restoration of virtual machine image is not fully supported yet.

Scan and Repair Root Filesystem of VirtualMachine

February 1, 2023 · 4 min read

Vicente Cheng

Senior Software Engineer

In earlier versions of Harvester (v1.0.3 and prior), Longhorn volumes may get corrupted during the replica rebuilding process (reference: Analysis: Potential Data/Filesystem Corruption). In Harvester v1.1.0 and later versions, the Longhorn team has fixed this issue. This article covers manual steps you can take to scan the VM's filesystem and repair it if needed.

Stop The VM And Backup Volume

Before you scan the filesystem, it is recommend you back up the volume first. For an example, refer to the following steps to stop the VM and backup the volume.

Find the target VM.

finding the target VM

Stop the target VM.

Stop the target VM

The target VM is stopped and the related volumes are detached. Now go to the Longhorn UI to backup this volume.

Enable Developer Tools & Features (Preferences -> Enable Developer Tools & Features).

Preferences then enable developer mode Enable the developer mode

Click the ⋮ button and select Edit Config to edit the config page of the VM.

goto edit config page of VM

Go to the Volumes tab and select Check volume details.

link to longhorn volume page

Click the dropdown menu on the right side and select 'Attach' to attach the volume again.

attach this volume again

Select the attached node.

choose the attached node

Check the volume attached under Volume Details and select Take Snapshot on this volume page.

take snapshot on volume page

Confirm that the snapshot is ready.

check the snapshot is ready

Now that you completed the volume backup, you need to scan and repair the root filesystem.

Scanning the root filesystem and repairing

This section will introduce how to scan the filesystem (e.g., XFS, EXT4) using related tools.

Before scanning, you need to know the filesystem's device/partition.

Identify the filesystem's device by checking the major and minor numbers of that device.

Obtain the major and minor numbers from the listed volume information.
In the following example, the volume name is pvc-ea7536c0-301f-479e-b2a2-e40ddc864b58.
```
harvester-node-0:~ # ls /dev/longhorn/pvc-ea7536c0-301f-479e-b2a2-e40ddc864b58 -al
brw-rw---- 1 root root 8, 0 Oct 23 14:43 /dev/longhorn/pvc-ea7536c0-301f-479e-b2a2-e40ddc864b58
```
The output indicates that the major and minor numbers are 8:0.

Obtain the device name from the output of the lsblk command.

harvester-node-0:~ # lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0     3G  1 loop /
sda      8:0    0    40G  0 disk
├─sda1   8:1    0     2M  0 part
├─sda2   8:2    0    20M  0 part
└─sda3   8:3    0    40G  0 part

The output indicates that 8:0 are the major and minor numbers of the device named sda. Therefore, /dev/sda is related to the volume named pvc-ea7536c0-301f-479e-b2a2-e40ddc864b58.

You should now know the filesystem's partition. In the example below, sda3 is the filesystem's partition.
Use the Filesystem toolbox image to scan and repair.

# docker run -it --rm --privileged registry.opensuse.org/isv/rancher/harvester/toolbox/main/fs-toolbox:latest -- bash

Then we try to scan with this target device.

XFS

When scanning an XFS filesystem, use the xfs_repair command and specify the problematic partition of the device.

In the following example, /dev/sda3 is the problematic partition.

# xfs_repair -n /dev/sda3

To repair the corrupted partition, run the following command.

# xfs_repair /dev/sda3

EXT4

When scanning a EXT4 filesystem, use the e2fsck command as follows, where the /dev/sde1 is the problematic partition of the device.

# e2fsck -f /dev/sde1

To repair the corrupted partition, run the following command.

# e2fsck -fp /dev/sde1

After using the 'e2fsck' command, you should also see logs related to scanning and repairing the partition. Scanning and repairing the corrupted partition is successful if there are no errors in these logs.

Detach and Start VM again.

After the corrupted partition is scanned and repaired, detach the volume and try to start the related VM again.

Detach the volume from the Longhorn UI.

detach volume on longhorn UI

Start the related VM again from the Harvester UI.

Start VM again

Your VM should now work normally.

Evicting Replicas From a Disk (the CLI way)

January 12, 2023 · 2 min read

Kiefer Chang

Engineer Manager

Harvester replicates volumes data across disks in a cluster. Before removing a disk, the user needs to evict replicas on the disk to other disks to preserve the volumes' configured availability. For more information about eviction in Longhorn, please check Evicting Replicas on Disabled Disks or Nodes.

Preparation

This document describes how to evict Longhorn disks using the kubectl command. Before that, users must ensure the environment is set up correctly. There are two recommended ways to do this:

Log in to any management node and switch to root (sudo -i).
Download Kubeconfig file and use it locally
- Install kubectl and yq program manually.
- Open Harvester GUI, click support at the bottom left of the page and click Download KubeConfig to download the Kubeconfig file.
- Set the Kubeconfig file's path to KUBECONFIG environment variable. For example, export KUBECONFIG=/path/to/kubeconfig.

Evicting replicas from a disk

List Longhorn nodes (names are identical to Kubernetes nodes):

kubectl get -n longhorn-system nodes.longhorn.io

Sample output:

NAME    READY   ALLOWSCHEDULING   SCHEDULABLE   AGE
node1   True    true              True          24d
node2   True    true              True          24d
node3   True    true              True          24d

List disks on a node. Assume we want to evict replicas of a disk on node1:

kubectl get -n longhorn-system nodes.longhorn.io node1 -o yaml | yq e '.spec.disks'

Sample output:

default-disk-ed7af10f5b8356be:
  allowScheduling: true
  evictionRequested: false
  path: /var/lib/harvester/defaultdisk
  storageReserved: 36900254515
  tags: []

Assume disk default-disk-ed7af10f5b8356be is the target we want to evict replicas out of.
Edit the node:
```
kubectl edit -n longhorn-system nodes.longhorn.io node1 
```
Update these two fields and save:
- spec.disks.<disk_name>.allowScheduling to false
- spec.disks.<disk_name>.evictionRequested to true
Sample editing:
```
default-disk-ed7af10f5b8356be:
  allowScheduling: false
  evictionRequested: true
  path: /var/lib/harvester/defaultdisk
  storageReserved: 36900254515
  tags: []
```

Wait for all replicas on the disk to be evicted.

Get current scheduled replicas on the disk:

kubectl get -n longhorn-system nodes.longhorn.io node1 -o yaml | yq e '.status.diskStatus.default-disk-ed7af10f5b8356be.scheduledReplica'

Sample output:

pvc-86d3d212-d674-4c64-b69b-4a2eb1df2272-r-7b422db7: 5368709120
pvc-b06f0b09-f30c-4936-8a2a-425b993dd6cb-r-bb0fa6b3: 2147483648
pvc-b844bcc6-3b06-4367-a136-3909251cb560-r-08d1ab3c: 53687091200
pvc-ea6e0dff-f446-4a38-916a-b3bea522f51c-r-193ca5c6: 10737418240

Run the command repeatedly, and the output should eventually become an empty map:

{}

This means Longhorn evicts replicas on the disk to other disks.

note

If a replica always stays in a disk, please open the Longhorn GUI and check if there is free space on other disks.

note

important

Harvester Installation​

Install and Configure Velero​

Deploy the NFS CSI and Example Server​

Preparing the Virtual Machine and Image​

optional

Backup the Source Namespace​

info

tips

Restore To A Different Namespace​

note

tips

Restore To A Different Cluster​

Select Longhorn Volume Snapshot Class​

Limitations​

Stop The VM And Backup Volume​

Scanning the root filesystem and repairing​

XFS​

EXT4​

Detach and Start VM again.​

Preparation​

Evicting replicas from a disk​

note

Harvester Installation

Install and Configure Velero

Deploy the NFS CSI and Example Server

Preparing the Virtual Machine and Image

Backup the Source Namespace

Restore To A Different Namespace

Restore To A Different Cluster

Select Longhorn Volume Snapshot Class

Limitations

Stop The VM And Backup Volume

Scanning the root filesystem and repairing

XFS

EXT4

Detach and Start VM again.

Preparation

Evicting replicas from a disk