Node selector constraints can prevent the scheduler from live-migrating a virtual machine to a target node. This often indicates a mismatch between the virtual machine's requirements and the node's capabilities.
A node selector may require a specific CPU feature, but the target node lacks the corresponding label (for example, cpu-feature.node.kubevirt.io/fpu: "true"). This mismatch can occur when the host-model CPU models and features computed by KubeVirt change over time.
You can resolve this issue using four different approaches.
Reboot the virtual machine.
KubeVirt automatically adds node selectors (during a previous migration or initial start) that can restrict scheduling. You can clear these node selectors by rebooting the virtual machine.
Reboot the virtual machine and set up a common CPU model.
You can override KubeVirt's default host-model CPU configuration by setting up a common CPU model for virtual machine migration. The model is applied to the virtual machine as its domain CPU, and to the pod as its node selector configuration.
This is the recommended approach for environments that can tolerate restarting of virtual machines.
Modify the node labels.
If rebooting the virtual machine is not an option, you can manually manipulate the target node's labels to satisfy the scheduling requirements.
Add the node-labeller.kubevirt.io/skip-node="true" annotation to the target node.
This annotation, which persists even after upgrades, prevents KubeVirt's node-labeller from automatically adding or removing CPU-related labels on this node.
The annotation itself does not affect the pod's node selector. It only controls the presence of specific CPU-related labels on the node, which the node selector checks against. For more information, see the References section.
Identify labels that are missing from the virtual machine's node selector and add them to the target node.
You can add the missing labels using the following command:
kubectl label node<node-name><key>=<value>
This circumvents the standard scheduling restrictions, allowing the virtual machine to migrate to the target node.
If a new node that lacks the required features is added to the cluster, you must repeat these steps to allow the virtual machine to live-migrate to that node.
Remove the node labels.
If you want to ensure that the virtual machine does not acquire specific node selector constraints after live migration, you can remove the relevant CPU labels from the target node.
Add the node-labeller.kubevirt.io/skip-node="true" annotation to the target node.
This annotation, which persists even after upgrades, prevents KubeVirt's node-labeller from automatically adding or removing CPU-related labels on this node.
This method works only if the virtual machine's pod does not have an existing node selector that contains the labels listed in the References section. Otherwise, you must reboot the virtual machine to clear the constraints.
Check if the pod has a node selector.
kubectl get pod <pod-name> -o yaml |grep nodeSelector -A 5 -B 5
If no node selector exists, remove the relevant CPU labels from the node.
Performing this action prevents the pod from acquiring new node selector constraints, thus enabling its future migration to other nodes. However, the successful outcome of that migration is not guaranteed.
When guest virtual machines running on Harvester nodes experience very slow network throughput, disabling Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO) on the host interfaces may resolve the issue.
In the testing environment, guest virtual machines experienced severely degraded download and transfer speeds, dropping as low as 100 bps. This extreme slowdown was particularly evident when apt-get update, curl, and scp were used to transfer data between virtual machines running on different nodes. In contrast, performance remained normal when the virtual machines were hosted on the same node.
The issue was observed in a Harvester cluster hosted on Dell servers using Broadcom NetXtreme-E Series BCM57508 NICs (100 Gbps). mgmt, the built-in cluster network, was used for both management and virtual machine traffic.
Harvester relies on Linux’s bridge-based virtual networking to connect guest virtual machines to physical networks.
The NetXtreme-E BCM57508 NICs were connected to leaf switches configured to transmit jumbo frames. When the default MTU of 1500 is used, these frames should ideally be segmented to approximately 1450 bytes before reaching the Harvester host kernel. However, the packets actually arriving at the kernel were fragmented into unexpectedly small sizes. This forced the kernel to process a significantly higher volume of packets, leading to increased CPU overhead and reduced download throughput.
Packets captures collected using the following command confirmed the unexpectedly small size of the incoming packets.
tcpdump -xx -i <interface-name> <interface-name> is the name of the physical interface on the host connected to the VMs.
Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO) are kernel-level software offloading mechanisms designed to optimize network performance. GRO aggregates multiple small incoming packets into larger ones before passing them to the network stack. GSO performs the opposite on transmission, splitting large packets into smaller frames before sending them to the NIC.
While these features are typically used to enhance performance, in this specific scenario, they interfered with the normal TCP segmentation process. This interference led to inefficient packet segmentation and the creation of an excessive number of small fragments, which ultimately degraded overall network performance.
When GRO and GSO were disabled, the Linux network stack automatically reverted to using standard transport-layer segmentation methods, specifically TCP Segmentation Offload (TSO) and Large Receive Offload (LRO). These mechanisms maintained efficient packet aggregation and segmentation at the appropriate layers, ensuring properly sized packets were presented to the kernel, which successfully restored expected network performance.
The NetXtreme-E BCM57508 NICs may experience suboptimal interaction with GRO and GSO due to a Broadcom driver bug. Enabling these offload mechanisms led to inefficient packetization, producing many small packets instead of fewer large ones, which ultimately reduced network throughput.
However, in certain scenarios, users may require the VM filesystem to be quiesced during Velero backup creation to prevent data corruption, especially when the VM is experiencing heavy I/O operations.
This article describes how to customize Velero Backup Hooks to implement filesystem freeze during Velero backup processing, ensuring data consistency in the backup content.
KubeVirt's virt-freezer provides a mechanism to freeze and thaw guest filesystems. This capability can be leveraged to ensure filesystem consistency during VM backups. However, certain prerequisites must be met for filesystem freeze/thaw operations to function properly:
Based on Harvester project experience, some guest operating systems require additional configuration:
Linux distributions (e.g., RHEL, SLE Micro): May lack sufficient permissions for filesystem freeze operations by default, requiring custom policies
Windows guests: Require the VSS service to be enabled for filesystem freeze functionality
Important: Filesystem freeze/thaw functionality depends on guest VM configuration, which is outside Harvester's control. Users are responsible for ensuring compatibility before implementing Velero backup hooks with filesystem freeze.
If the guest VM is configured correctly, the Velero backup will complete successfully with HooksAttempted indicating successful hook execution.
Check the backup status using:
velero backup describe [Backup Name] --details
Example output showing successful hook execution:
Name: demo Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/resource-timeout=10m0s velero.io/source-cluster-k8s-gitversion=v1.33.3+rke2r1 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=33 Phase: Completed Namespaces: Included: demo Excluded: <none> Resources: Included: * Excluded: <none> Cluster-scoped: auto Label selector: <none> Or label selector: <none> Storage Location: default Velero-Native Snapshot PVs: auto Snapshot Move Data: true Data Mover: velero .... Backup Volumes: Velero-Native Snapshots: <none included> CSI Snapshots: demo/vm-nfs-disk-0-au2ej: Data Movement: Operation ID: du-be5417aa-498e-4b93-b59f-e6498f95a6df.d7f97dab-3bb1-41e189381 Data Mover: velero Uploader Type: kopia Moved data Size (bytes): 5368709120 Result: succeeded Pod Volume Backups: <none included> HooksAttempted: 2 HooksFailed: 0
The output shows that Velero pre/post backup hooks completed successfully. In this case, the hooks are connected to guest VM filesystem freeze and thaw operations to ensure data consistency.
Implementing filesystem freeze hooks with Velero ensures data consistency during VM backups by quiescing the filesystem before snapshot creation. This approach is particularly valuable for VMs with high I/O activity or critical data that requires point-in-time consistency guarantees.
For Harvester to successfully migrate a virtual machine from one node to another, the source and target nodes must have compatible CPU models and features.
If the CPU model of a virtual machine isn't specified, KubeVirt assigns it the default host-model configuration so that the virtual machine has the CPU model closest to the one used on the host node.
KubeVirt automatically adjusts the node selectors of the associated virt-launcher Pod based on this configuration. If the CPU models and features of the source and target nodes do not match, the live migration may fail.
Let's examine an example.
When a virtual machine is first migrated to another node with the SierraForest CPU model, the following key-value pairs are added to the spec.nodeSelector field in the Pod spec.
The above nodeSelector configuration is retained for subsequent migrations, which may fail if the new target node doesn't have the corresponding features or model.
For example, compare the CPU model and feature labels added by KubeVirt to the following two nodes:
# Node A labels: cpu-model-migration.node.kubevirt.io/SierraForest:"true" cpu-feature.node.kubevirt.io/fpu:"true" cpu-feature.node.kubevirt.io/vme:"true" # Node B labels: cpu-model-migration.node.kubevirt.io/SierraForest:"true" cpu-feature.node.kubevirt.io/vme:"true"
This virtual machine will fail to migrate to Node B due to the missing fpu feature. However, if the virtual machine doesn't actually require this feature, this can be frustrating. Therefore, setting up a common CPU model can resolve this issue.
You can define a custom CPU model to ensure that the spec.nodeSelector configuration in the Pod spec is assigned a CPU model that is compatible and common to all nodes in the cluster.
Consider this example.
We have the following node information:
# Node A labels: cpu-model.node.kubevirt.io/IvyBridge:"true" cpu-feature.node.kubevirt.io/fpu:"true" cpu-feature.node.kubevirt.io/vme:"true" # Node B labels: cpu-model.node.kubevirt.io/IvyBridge:"true" cpu-feature.node.kubevirt.io/vme:"true"
If we set up IvyBridge as our CPU model in the virtual machine spec, KubeVirt only adds cpu-model.node.kubevirt.io/IvyBridge under spec.nodeSelector in the Pod spec.
If your virtual machines run only on a specific CPU model, you can set up a cluster-wide CPU model in the kubevirt resource.
You can edit it with kubectl edit kubevirt kubevirt -n harvester-system, then add the CPU model you want in the following spec:
spec: configuration: cpuModel: IvyBridge
Then, when a new virtual machine starts or an existing virtual machine restarts, the cluster-wide setting will be applied. The system follows these priorities when using CPU models if you configure them in both locations:
We have the default SSH user rancher, but other users may be required. Creating users with useradd will result in their deletion upon restarting the Harvester node; therefore, follow the steps below to create persistent SSH users.
If a Harvester node becomes unreachable, Harvester attempts to reschedule its virtual machines to another healthy node. However, this rescheduling doesn't happen immediately. The associated virt-launcher pods may continue to appear to remain in the ready state due to its KubeVirt readiness gate configuration.
To mitigate this elapsed time, you can modify the vm-force-reset-policy setting, by reducing its period value. This enables Harvester to detect non-ready virtual machines on unreachable nodes sooner.
This setting can be found in the Advanced -> Settings page on the Harvester UI.
Additionally, while the current default is 5 minutes, we are considering reducing the default value [1].
Harvester allows you to add disks as data volumes. However, only disks that have a World Wide Name (WWN) are displayed on the UI. This occurs because the Harvester node-disk-manager uses the ID_WWN value from udev to uniquely identify disks. The value may not exist in certain situations, particularly when the disks are connected to certain hardware RAID controllers. In these situations, you can view the disks only if you access the host using SSH and run a command such as cat /proc/partitions.
To allow extra disks without WWNs to be visible to Harvester, perform either of the following workarounds:
Use this method only if the provisioner of the extra disk is Longhorn V1, which is filesystem-based. This method will not work correctly with LVM and Longhorn V2, which are both block device-based.
When you create a filesystem on a disk (for example, using the command mkfs.ext4 /dev/sda), a filesystem UUID is assigned to the disk. Harvester uses this value to identify disks without a WWN.
In Harvester versions earlier than v1.6.0, you can use this workaround for only one extra disk because of a bug in duplicate device checking.
Workaround 2: Add a udev rule for generating fake WWNs
note
This method works with all of the supported provisioners.
You can add a udev rule that generates a fake WWN for each extra disk based on the device serial number. Harvester accepts the generated WWNs because the only requirement is a unique ID_WWN value as presented by udev.
A YAML file containing the necessary udev rule must be created in the /oem directory on each host. This process can be automated across the Harvester cluster using a CloudInit Resource.
Create a YAML file named fake-scsi-wwn-generator.yaml with the following contents:
apiVersion: node.harvesterhci.io/v1beta1 kind: CloudInit metadata: name: fake-scsi-wwn-generator spec: matchSelector:{} filename: 90_fake_scsi_wwn_generator.yaml contents:| name: "Add udev rules to generate missing SCSI disk WWNs" stages: initramfs: - files: - path: /etc/udev/rules.d/59-fake-scsi-wwn-generator.rules permissions: 420 owner: 0 group: 0 content: | # For anything that looks like a SCSI disk (/dev/sd*), # if it has a serial number, but does _not_ have a WWN, # create a fake WWN based on the serial number. We need # to set both ID_WWN so Harvester's node-disk-manager # can see the WWN, and ID_WWN_WITH_EXTENSION which is # what 60-persistent-storage.rules uses to generate a # /dev/disk/by-id/wwn-* symlink for the device. ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd*[!0-9]", \ ENV{ID_SERIAL}=="?*", \ ENV{ID_WWN}!="?*", ENV{ID_WWN_WITH_EXTENSION}!="?*", \ ENV{ID_WWN}="fake.$env{ID_SERIAL}", \ ENV{ID_WWN_WITH_EXTENSION}="fake.$env{ID_SERIAL}"
Apply the file's contents to the cluster by running the command kubectl apply -f fake-scsi-wwn-generator.yaml.
The file /oem/90_fake_scsi_wwn_generator.yaml is automatically created on all cluster nodes.
Reboot all nodes to apply the new udev rule.
Once the rule is applied, you should be able to view and add extra disks that were previously not visible on the Harvester UI.
Harvester 1.5 introduces support for the provisioning of virtual machine root volumes and data volumes using external Container Storage Interface (CSI) drivers.
This article demonstrates how to use Velero 1.16.0 to perform backup and restore of virtual machines in Harvester.
It goes through commands and manifests to:
Back up virtual machines in a namespace, their NFS CSI volumes, and associated namespace-scoped configuration
Export the backup artifacts to an AWS S3 bucket
Restore to a different namespace on the same cluster
Restore to a different cluster
Velero is a Kubernetes-native backup and restore tool that enables users to perform scheduled and on-demand backups of virtual machines to external object storage providers such as S3, Azure Blob, or GCS, aligning with enterprise backup and disaster recovery practices.
note
The commands and manifests used in this article are tested with Harvester 1.5.1.
The CSI NFS driver and Velero configuration and versions used are for demonstration purposes only. Adjust them according to your environment and requirements.
important
The examples provided are intended to backup and restore Linux virtual machine workloads. It is not suitable for backing up guest clusters provisioned via the Harvester Rancher integration.
To backup and restore guest clusters like RKE2, please refer to the distro official documentation.
Confirm the virtual machine image is successfully uploaded to Harvester:
Follow the instructions in the third-party storage documentation to create a virtual machine with NFS root and data volumes, using the image uploaded in the previous step.
For NFS CSI snapshot to work, the NFS data volume must have the volumeMode set to Filesystem:
optional
For testing purposes, once the virtual machine is ready, access it via SSH and add some files to both the root and data volumes.
The data volume needs to be partitioned, with a file system created and mounted before files can be written to it.
This creates a backup of the demo-src namespace containing resources like the virtual machine created earlier, its volumes, secrets and other associated configuration.
Depending on the size of the virtual machine and its volumes, the backup may take a while to complete.
The DataUpload custom resources provide insights into the backup progress:
kubectl -n velero get datauploads -l velero.io/backup-name="${BACKUP_NAME}"
Confirm that the backup completed successfully:
velero backup get "${BACKUP_NAME}"
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR backup-demo-src-1747954979 Completed 0 0 2025-05-22 16:04:46 -0700 PDT 29d default <none>
After the backup completes, Velero removes the CSI snapshots from the storage side to free up the snapshot data space.
tips
The velero backup describe and velero backup logs commands can be used to assess details of the backup including resources included, skipped, and any warnings or errors encountered during the backup process.
This restore modifier removes the harvesterhci.io/volumeForVirtualMachine annotation from the virtual machine data volumes to ensure that the restoration do not conflict with the CDI volume import populator.
Create the restore modifier:
kubectl -n velero create cm modifier-data-volumes --from-file=modifier-data-volumes.yaml
The virtual machine MAC address and firmware UUID are reset to avoid potential conflicts with existing virtual machines.
the virtual machine image manifest is excluded because Velero restores the entire state of the virtual machine from the backup.
the modifier-data-volumes restore modifier is invoked to modify the virtual machine data volumes metadata to prevent conflicts with the CDI volume import populator.
While the restore operation is still in-progress, the DataDownload custom resources can be used to examine the progress of the operation:
RESTORE_NAME=backup-demo-src-1747954979-20250522164015 kubectl -n velero get datadownload -l velero.io/restore-name="${RESTORE_NAME}"
Confirm that the restore completed successfully:
velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR backup-demo-src-1747954979-20250522164015 backup-demo-src-1747954979 Completed 2025-05-22 16:40:15 -0700 PDT 2025-05-22 16:40:49 -0700 PDT 0 6 2025-05-22 16:40:15 -0700 PDT <none>
Verify that the virtual machine and its configuration are restored to the new demo-dst namespace:
note
Velero uses Kopia as its default data mover. This issue describes some of its limitations on advanced file system features such as setuid/gid, hard links, mount points, sockets, xattr, ACLs, etc.
Velero provides the --data-mover option to configure custom data movers to satisfy different use cases. For more information, see the Velero's documentation.
tips
The velero restore describe and velero restore logs commands provide more insights into the restore operation including the resources restored, skipped, and any warnings or errors encountered during the restore process.
This section extends the above scenario to demonstrate the steps to restore the backup to a different Harvester cluster.
On the target cluster, install Velero, and set up the NFS CSI and NFS server following the instructions from the Deploy the NFS CSI and Example Server section.
Once Velero is configured to use the same backup location as the source cluster, it automatically discovers the available backups:
velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR backup-demo-src-1747954979 Completed 0 0 2025-05-22 16:04:46 -0700 PDT 29d default <none>
By default, Velero only supports resource filtering by resource groups and labels. In order to backup/restore a single instance of virtual machine, custom labels must be applied to the virtual machine, and its virtual machine instance, pod, data volumes, persistent volumes claim, persistent volumes and cloudinit secret resources. It's recommended to backup the entire namespace and perform resource filtering during restore to ensure that backup contains all the dependency resources required by the virtual machine.
The restoration of virtual machine image is not fully supported yet.
Users wishing to prevent privilege escalation and other security issues can leverage Kubernetes' Pod Security Standards (PSS) on Harvester. PSS are a set of security policies that can be applied to clusters and namespaces to control and restrict how workloads are executed.
Pod Security Standards in Harvester can be used when provisioning VM workloads and also with the new experimental support for running baremetal container workloads.
The baseline policy is aimed at ease of adoption for common containerized workloads while preventing known privilege escalations. This policy is targeted at application operators and developers of non-critical applications.
warning
VMs with device passthrough, such as pcidevices, usbdevices and vgpudevices, will fail to start with baseline policy, as they need SYS_RESOURCE capability. This is being tracked on issue #8218. A fix should be available for this shortly.
Do not apply PSS to the system's namespaces, as they need privileged permissions to manage cluster resources. Only trusted users must have access to system's namespaces.
Cluster wide PSS can be enabled by passing an Admission Control configuration via kube-apiserver arguments. This can be done via Harvester's CloudInit using the following configuration which can be saved to cloudinit-pss.yaml file:
The cluster admin can apply this against the Harvester cluster using kubectl apply -f cloudinit-pss.yaml. The change requires a restart of the control plane nodes to ensure that the Elemental cloud-init directives are applied on boot. Once control plane nodes are rebooted, a default baseline pod security standard will be enforced against all current and subsequently created namespaces. The namespaces listed under exemptions will be skipped. Users are free to tweak the list, to better suit their use cases.
For future integration of Pod Security Admission (PSA) configuration natively in Harvester, please verify the progress of issue #8196.
Post application of a default PSS, end users, with permissions to create and edit namespaces, may still be able to override the respective policy by labeling their namespaces to support privileged workloads, for example, as follows:
To avoid this, we recommend users to create custom RBACs restricting who can create/update namespaces or to also deploy a Validating Admission Policy. The following policy will block namespace create/update requests containing a label pod-security.kubernetes.io/enforce, there by preventing namespace admins from changing the settings for their namespace.
In case more tailored policies are needed, users can rely on security policy engines like Kubewarden's policy PSA Label Enforcer, or similar solution, to ensure that namespaces have the required PSS configuration for deployment in the cluster.
CVE-2025-1974 (vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H) has a score of 9.8 (Critical).
The vulnerability affects specific versions of the RKE2 ingress-nginx controller (v.1.11.4 and earlier, and v1.12.0). All Harvester versions that use this controller (including v1.4.2 and earlier) are therefore affected.
This CVE is fixed in Harvester 1.5.0, 1.4.3 and newer.
A security issue was discovered in Kubernetes where under certain conditions, an unauthenticated attacker with access to the pod network can achieve arbitrary code execution in the context of the ingress-nginx controller. This can lead to disclosure of secrets accessible to the controller. (Note that in the default installation, the controller can access all secrets cluster-wide.)
You can confirm the version of the RKE2 ingress-nginx pods by running this command on your Harvester cluster:
kubectl -n kube-system get po -l"app.kubernetes.io/name=rke2-ingress-nginx" -ojsonpath='{.items[].spec.containers[].image}'
If the command returns one of the affected versions, disable the rke2-ingress-nginx-admission validating webhook configuration by performing the following steps:
On one of your control plane nodes, use kubectl to confirm the existence of the HelmChartConfig resource named rke2-ingress-nginx:
$ kubectl -n kube-system get helmchartconfig rke2-ingress-nginx NAME AGE rke2-ingress-nginx 14d1h
Use kubectl -n kube-system edit helmchartconfig rke2-ingress-nginx to add the following configurations to the resource:
The following is an example of what the updated .spec.valuesContent configuration along with the default Harvester ingress-nginx configuration should look like:
Exit the kubectl edit command execution to save the configuration.
Harvester automatically applies the change once the content is saved.
important
The configuration disables the RKE2 ingress-nginx admission webhooks while preserving Harvester's default ingress-nginx configuration.
If the HelmChartConfig resource contains other custom ingress-nginx configuration, you must retain them when editing the resource.
Verify that RKE2 deleted the rke2-ingress-nginx-admission validating webhook configuration.
$ kubectl get validatingwebhookconfiguration rke2-ingress-nginx-admission Error from server (NotFound): validatingwebhookconfigurations.admissionregistration.k8s.io "rke2-ingress-nginx-admission" not found
Verify that the ingress-nginx pods are restarted successfully.
$ kubectl -n kube-system get po -lapp.kubernetes.io/instance=rke2-ingress-nginx NAME READY STATUS RESTARTS AGE rke2-ingress-nginx-controller-g8l49 1/1 Running 0 5s
Once your Harvester cluster receives the RKE2 ingress-nginx patch, you can re-install the rke2-ingress-nginx-admission validating webhook configuration by removing the HelmChartConfig patch.
important
These steps only cover the RKE2 ingress-nginx controller that is managed by Harvester. You must also update other running ingress-nginx controllers. See the References section for more information.