Skip to main content

3 posts tagged with "VM"

View All Tags

· 11 min read
Jian Wang

In Harvester, the VM Live Migration is well supported by the UI. Please refer to Harvester VM Live Migration for more details.

The VM Live Migration process is finished smoothly in most cases. However, sometimes the migration may get stuck and not end as expected.

This article dives into the VM Live Migration process in more detail. There are three main parts:

  • General Process of VM Live Migration
  • VM Live Migration Strategies
  • VM Live Migration Configurations

Related issues:

note

A big part of the following contents are copied from kubevirt document https://kubevirt.io/user-guide/operations/live_migration/, some contents/formats are adjusted to fit in this document.

General Process of VM Live Migration

Starting a Migration from Harvester UI

  1. Go to the Virtual Machines page.
  2. Find the virtual machine that you want to migrate and select > Migrate.
  3. Choose the node to which you want to migrate the virtual machine and select Apply.

After successfully selecting Apply, a CRD VirtualMachineInstanceMigration object is created, and the related controller/operator will start the process.

Migration CRD Object

You can also create the CRD VirtualMachineInstanceMigration object manually via kubectl or other tools.

The example below starts a migration process for a virtual machine instance (VMI) new-vm.

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
name: migration-job
spec:
vmiName: new-vm

Under the hood, the open source projects Kubevirt, Libvirt, QEMU, ... perform most of the VM Live Migration. References.

Migration Status Reporting

When starting a virtual machine instance (VMI), it has also been calculated whether the machine is live migratable. The result is being stored in the VMI VMI.status.conditions. The calculation can be based on multiple parameters of the VMI, however, at the moment, the calculation is largely based on the Access Mode of the VMI volumes. Live migration is only permitted when the volume access mode is set to ReadWriteMany. Requests to migrate a non-LiveMigratable VMI will be rejected.

The reported Migration Method is also being calculated during VMI start. BlockMigration indicates that some of the VMI disks require copying from the source to the destination. LiveMigration means that only the instance memory will be copied.

Status:
Conditions:
Status: True
Type: LiveMigratable
Migration Method: BlockMigration

Migration Status

The migration progress status is reported in VMI.status. Most importantly, it indicates whether the migration has been completed or failed.

Below is an example of a successful migration.

Migration State:
Completed: true
End Timestamp: 2019-03-29T03:37:52Z
Migration Config:
Completion Timeout Per GiB: 800
Progress Timeout: 150
Migration UID: c64d4898-51d3-11e9-b370-525500d15501
Source Node: node02
Start Timestamp: 2019-03-29T04:02:47Z
Target Direct Migration Node Ports:
35001: 0
41068: 49152
38284: 49153
Target Node: node01
Target Node Address: 10.128.0.46
Target Node Domain Detected: true
Target Pod: virt-launcher-testvmimcbjgw6zrzcmp8wpddvztvzm7x2k6cjbdgktwv8tkq

VM Live Migration Strategies

VM Live Migration is a process during which a running Virtual Machine Instance moves to another compute node while the guest workload continues to run and remain accessible.

Understanding Different VM Live Migration Strategies

VM Live Migration is a complex process. During a migration, the source VM needs to transfer its whole state (mainly RAM) to the target VM. If there are enough resources available, such as network bandwidth and CPU power, migrations should converge nicely. If this is not the scenario, however, the migration might get stuck without an ability to progress.

The main factor that affects migrations from the guest perspective is its dirty rate, which is the rate by which the VM dirties memory. Guests with high dirty rate lead to a race during migration. On the one hand, memory would be transferred continuously to the target, and on the other, the same memory would get dirty by the guest. On such scenarios, one could consider to use more advanced migration strategies. Refer to Understanding different migration strategies for more details.

There are 3 VM Live Migration strategies/policies:

VM Live Migration Strategy: Pre-copy

Pre-copy is the default strategy. It should be used for most cases.

The way it works is as following:

  1. The target VM is created, but the guest keeps running on the source VM.
  2. The source starts sending chunks of VM state (mostly memory) to the target. This continues until all of the state has been transferred to the target.
  3. The guest starts executing on the target VM. 4. The source VM is being removed.

Pre-copy is the safest and fastest strategy for most cases. Furthermore, it can be easily cancelled, can utilize multithreading, and more. If there is no real reason to use another strategy, this is definitely the strategy to go with.

However, on some cases migrations might not converge easily, that is, by the time the chunk of source VM state would be received by the target VM, it would already be mutated by the source VM (which is the VM the guest executes on). There are many reasons for migrations to fail converging, such as a high dirty-rate or low resources like network bandwidth and CPU. On such scenarios, see the following alternative strategies below.

VM Live Migration Strategy: Post-copy

The way post-copy migrations work is as following:

  1. The target VM is created.
  2. The guest is being run on the target VM.
  3. The source starts sending chunks of VM state (mostly memory) to the target.
  4. When the guest, running on the target VM, would access memory: 1. If the memory exists on the target VM, the guest can access it. 2. Otherwise, the target VM asks for a chunk of memory from the source VM.
  5. Once all of the memory state is updated at the target VM, the source VM is being removed.

The main idea here is that the guest starts to run immediately on the target VM. This approach has advantages and disadvantages:

Advantages:

  • The same memory chink is never being transferred twice. This is possible due to the fact that with post-copy it doesn't matter that a page had been dirtied since the guest is already running on the target VM.
  • This means that a high dirty-rate has much less effect.
  • Consumes less network bandwidth.

Disadvantages:

  • When using post-copy, the VM state has no one source of truth. When the guest (running on the target VM) writes to memory, this memory is one part of the guest's state, but some other parts of it may still be updated only at the source VM. This situation is generally dangerous, since, for example, if either the target or guest VMs crash the state cannot be recovered.
  • Slow warmup: when the guest starts executing, no memory is present at the target VM. Therefore, the guest would have to wait for a lot of memory in a short period of time.
  • Slower than pre-copy on most cases.
  • Harder to cancel a migration.

VM Live Migration Strategy: Auto-converge

Auto-converge is a technique to help pre-copy migrations converge faster without changing the core algorithm of how the migration works.

Since a high dirty-rate is usually the most significant factor for migrations to not converge, auto-converge simply throttles the guest's CPU. If the migration would converge fast enough, the guest's CPU would not be throttled or throttled negligibly. But, if the migration would not converge fast enough, the CPU would be throttled more and more as time goes.

This technique dramatically increases the probability of the migration converging eventually.

Observe the VM Live Migration Progress and Result

Migration Timeouts

Depending on the type, the live migration process will copy virtual machine memory pages and disk blocks to the destination. During this process non-locked pages and blocks are being copied and become free for the instance to use again. To achieve a successful migration, it is assumed that the instance will write to the free pages and blocks (pollute the pages) at a lower rate than these are being copied.

Completion Time

In some cases the virtual machine can write to different memory pages / disk blocks at a higher rate than these can be copied, which will prevent the migration process from completing in a reasonable amount of time. In this case, live migration will be aborted if it is running for a long period of time. The timeout is calculated base on the size of the VMI, it's memory and the ephemeral disks that are needed to be copied. The configurable parameter completionTimeoutPerGiB, which defaults to 800s is the time for GiB of data to wait for the migration to be completed before aborting it. A VMI with 8Gib of memory will time out after 6400 seconds.

Progress Timeout

A VM Live Migration will also be aborted when it notices that copying memory doesn't make any progress. The time to wait for live migration to make progress in transferring data is configurable by the progressTimeout parameter, which defaults to 150 seconds.

VM Live Migration Configurations

Changing Cluster Wide Migration Limits

KubeVirt puts some limits in place so that migrations don't overwhelm the cluster. By default, it is to only run 5 migrations in parallel with an additional limit of a maximum of 2 outbound migrations per node. Finally, every migration is limited to a bandwidth of 64MiB/s.

You can change these values in the kubevirt CR:

    apiVersion: kubevirt.io/v1
kind: Kubevirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
configuration:
migrations:
parallelMigrationsPerCluster: 5
parallelOutboundMigrationsPerNode: 2
bandwidthPerMigration: 64Mi
completionTimeoutPerGiB: 800
progressTimeout: 150
disableTLS: false
nodeDrainTaintKey: "kubevirt.io/drain"
allowAutoConverge: false ---------------------> related to: Auto-converge
allowPostCopy: false -------------------------> related to: Post-copy
unsafeMigrationOverride: false

Remember that most of these configurations can be overridden and fine-tuned to a specified group of VMs. For more information, please refer to the Migration Policies section below.

Migration Policies

Migration policies provides a new way of applying migration configurations to Virtual Machines. The policies can refine Kubevirt CR's MigrationConfiguration that sets the cluster-wide migration configurations. This way, the cluster-wide settings default how the migration policy can be refined (i.e., changed, removed, or added).

Remember that migration policies are in version v1alpha1. This means that this API is not fully stable yet and that APIs may change in the future.

Migration Configurations

Currently, the MigrationPolicy spec only includes the following configurations from Kubevirt CR's MigrationConfiguration. (In the future, more configurations that aren't part of Kubevirt CR will be added):

apiVersion: migrations.kubevirt.io/v1alpha1
kind: MigrationPolicy
spec:
allowAutoConverge: true
bandwidthPerMigration: 217Ki
completionTimeoutPerGiB: 23
allowPostCopy: false

All the above fields are optional. When omitted, the configuration will be applied as defined in KubevirtCR's MigrationConfiguration. This way, KubevirtCR will serve as a configurable set of defaults for both VMs that are not bound to any MigrationPolicy and VMs that are bound to a MigrationPolicy that does not define all fields of the configurations.

Matching Policies to VMs

Next in the spec are the selectors defining the group of VMs to apply the policy. The options to do so are the following.

This policy applies to the VMs in namespaces that have all the required labels:

apiVersion: migrations.kubevirt.io/v1alpha1
kind: MigrationPolicy
spec:
selectors:
namespaceSelector:
hpc-workloads: true # Matches a key and a value

The policy below applies to the VMs that have all the required labels:

apiVersion: migrations.kubevirt.io/v1alpha1
kind: MigrationPolicy
spec:
selectors:
virtualMachineInstanceSelector:
workload-type: db # Matches a key and a value

References

Documents

Libvirt Guest Migration

Libvirt has a chapter to describe the pricipal of VM/Guest Live Migration.

https://libvirt.org/migration.html

Kubevirt Live Migration

https://kubevirt.io/user-guide/operations/live_migration/

Source Code

The VM Live Migration related configuration options are passed to each layer correspondingly.

Kubevirt

https://github.com/kubevirt/kubevirt/blob/d425593ae392111dab80403ef0cde82625e37653/pkg/virt-launcher/virtwrap/live-migration-source.go#L103

...
import "libvirt.org/go/libvirt"

...

func generateMigrationFlags(isBlockMigration, migratePaused bool, options *cmdclient.MigrationOptions) libvirt.DomainMigrateFlags {
...
if options.AllowAutoConverge {
migrateFlags |= libvirt.MIGRATE_AUTO_CONVERGE
}
if options.AllowPostCopy {
migrateFlags |= libvirt.MIGRATE_POSTCOPY
}
...
}

Go Package Libvirt

https://pkg.go.dev/libvirt.org/go/libvirt

const (
...
MIGRATE_AUTO_CONVERGE = DomainMigrateFlags(C.VIR_MIGRATE_AUTO_CONVERGE)
MIGRATE_RDMA_PIN_ALL = DomainMigrateFlags(C.VIR_MIGRATE_RDMA_PIN_ALL)
MIGRATE_POSTCOPY = DomainMigrateFlags(C.VIR_MIGRATE_POSTCOPY)
...
)

Libvirt

https://github.com/libvirt/libvirt/blob/bfe53e9145cd5996a791c5caff0686572b850f82/include/libvirt/libvirt-domain.h#L1030

    /* Enable algorithms that ensure a live migration will eventually converge.
* This usually means the domain will be slowed down to make sure it does
* not change its memory faster than a hypervisor can transfer the changed
* memory to the destination host. VIR_MIGRATE_PARAM_AUTO_CONVERGE_*
* parameters can be used to tune the algorithm.
*
* Since: 1.2.3
*/
VIR_MIGRATE_AUTO_CONVERGE = (1 << 13),
...
/* Setting the VIR_MIGRATE_POSTCOPY flag tells libvirt to enable post-copy
* migration. However, the migration will start normally and
* virDomainMigrateStartPostCopy needs to be called to switch it into the
* post-copy mode. See virDomainMigrateStartPostCopy for more details.
*
* Since: 1.3.3
*/
VIR_MIGRATE_POSTCOPY = (1 << 15),

· 4 min read
Date Huang

What is the default behavior of a VM with multiple NICs

In some scenarios, you'll setup two or more NICs in your VM to serve different networking purposes. If all networks are setup by default with DHCP, you might get random connectivity issues. And while it might get fixed after rebooting the VM, it still will lose connection randomly after some period.

How-to identify connectivity issues

In a Linux VM, you can use commands from the iproute2 package to identify the default route.

In your VM, execute the following command:

ip route show default
tip

If you get the access denied error, please run the command using sudo

The output of this command will only show the default route with the gateway and VM IP of the primary network interface (eth0 in the example below).

default via <Gateway IP> dev eth0 proto dhcp src <VM IP> metric 100

Here is the full example:

$ ip route show default
default via 192.168.0.254 dev eth0 proto dhcp src 192.168.0.100 metric 100

However, if the issue covered in this KB occurs, you'll only be able to connect to the VM via the VNC or serial console.

Once connected, you can run again the same command as before:

$ ip route show default

However, this time you'll get a default route with an incorrect gateway IP. For example:

default via <Incorrect Gateway IP> dev eth0 proto dhcp src <VM's IP> metric 100

Why do connectivity issues occur randomly

In a standard setup, cloud-based VMs typically use DHCP for their NICs configuration. It will set an IP and a gateway for each NIC. Lastly, a default route to the gateway IP will also be added, so you can use its IP to connect to the VM.

However, Linux distributions start multiple DHCP clients at the same time and do not have a priority system. This means that if you have two or more NICs configured with DHCP, the client will enter a race condition to configure the default route. And depending on the currently running Linux distribution DHCP script, there is no guarantee which default route will be configured.

As the default route might change in every DHCP renewing process or after every OS reboot, this will create network connectivity issues.

How to avoid the random connectivity issues

You can easily avoid these connectivity issues by having only one NIC attached to the VM and having only one IP and one gateway configured.

However, for VMs in more complex infrastructures, it is often not possible to use just one NIC. For example, if your infrastructure has a storage network and a service network. For security reasons, the storage network will be isolated from the service network and have a separate subnet. In this case, you must have two NICs to connect to both the service and storage networks.

You can choose a solution below that meets your requirements and security policy.

Disable DHCP on secondary NIC

As mentioned above, the problem is caused by a race condition between two DHCP clients. One solution to avoid this problem is to disable DHCP for all NICs and configure them with static IPs only. Likewise, you can configure the secondary NIC with a static IP and keep the primary NIC enabled with DHCP.

  1. To configure the primary NIC with a static IP (eth0 in this example), you can edit the file /etc/sysconfig/network/ifcfg-eth0 with the following values:
BOOTPROTO='static'
IPADDR='192.168.0.100'
NETMASK='255.255.255.0'

Alternatively, if you want to reserve the primary NIC using DHCP (eth0 in this example), use the following values instead:

BOOTPROTO='dhcp'
DHCLIENT_SET_DEFAULT_ROUTE='yes'
  1. You need to configure the default route by editing the file /etc/sysconfig/network/ifroute-eth0 (if you configured the primary NIC using DHCP, skip this step):
# Destination  Dummy/Gateway  Netmask  Interface
default 192.168.0.254 - eth0
warning

Do not put other default route for your secondary NIC

  1. Finally, configure a static IP for the secondary NIC by editing the file /etc/sysconfig/network/ifcfg-eth1:
BOOTPROTO='static'
IPADDR='10.0.0.100'
NETMASK='255.255.255.0'

Cloud-Init config

network:
version: 1
config:
- type: physical
name: eth0
subnets:
- type: dhcp
- type: physical
name: eth1
subnets:
- type: static
address: 10.0.0.100/24

Disable secondary NIC default route from DHCP

If your secondary NIC requires to get its IP from DHCP, you'll need to disable the secondary NIC default route configuration.

  1. Confirm that the primary NIC configures its default route in the file /etc/sysconfig/network/ifcfg-eth0:
BOOTPROTO='dhcp'
DHCLIENT_SET_DEFAULT_ROUTE='yes'
  1. Disable the secondary NIC default route configuration by editing the file /etc/sysconfig/network/ifcfg-eth1:
BOOTPROTO='dhcp'
DHCLIENT_SET_DEFAULT_ROUTE='no'

Cloud-Init config

This solution is not available in Cloud-Init. Cloud-Init didn't allow any option for DHCP.

· 16 min read
PoAn Yang

How does Harvester schedule a VM?

Harvester doesn't directly schedule a VM in Kubernetes, it relies on KubeVirt to create the custom resource VirtualMachine. When the request to create a new VM is sent, a VirtualMachineInstance object is created and it creates the corresponding Pod.

The whole VM creation processt leverages kube-scheduler, which allows Harvester to use nodeSelector, affinity, and resources request/limitation to influence where a VM will be deployed.

How does kube-scheduler decide where to deploy a VM?

First, kube-scheduler finds Nodes available to run a pod. After that, kube-scheduler scores each available Node by a list of plugins like ImageLocality, InterPodAffinity, NodeAffinity, etc.

Finally, kube-scheduler calculates the scores from the plugins results for each Node, and select the Node with the highest score to deploy the Pod.

For example, let's say we have a three nodes Harvester cluster with 6 cores CPU and 16G RAM each, and we want to deploy a VM with 1 CPU and 1G RAM (without resources overcommit).

kube-scheduler will summarize the scores, as displayed in Table 1 below, and will select the node with the highest score, harvester-node-2 in this case, to deploy the VM.

kube-scheduler logs
virt-launcher-vm-without-overcommit-75q9b -> harvester-node-0: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9960 memory:15166603264] ,score 0,
virt-launcher-vm-without-overcommit-75q9b -> harvester-node-1: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5560 memory:6352273408] ,score 45,
virt-launcher-vm-without-overcommit-75q9b -> harvester-node-2: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5350 memory:5941231616] ,score 46,

virt-launcher-vm-without-overcommit-75q9b -> harvester-node-0: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9960 memory:15166603264] ,score 4,
virt-launcher-vm-without-overcommit-75q9b -> harvester-node-1: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5560 memory:6352273408] ,score 34,
virt-launcher-vm-without-overcommit-75q9b -> harvester-node-2: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5350 memory:5941231616] ,score 37,

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="ImageLocality" node="harvester-node-0" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="ImageLocality" node="harvester-node-1" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="ImageLocality" node="harvester-node-2" score=54

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="InterPodAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="InterPodAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="InterPodAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeResourcesLeastAllocated" node="harvester-node-0" score=4
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeResourcesLeastAllocated" node="harvester-node-1" score=34
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeResourcesLeastAllocated" node="harvester-node-2" score=37

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodePreferAvoidPods" node="harvester-node-0" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodePreferAvoidPods" node="harvester-node-2" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodePreferAvoidPods" node="harvester-node-1" score=1000000

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="PodTopologySpread" node="harvester-node-0" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="PodTopologySpread" node="harvester-node-1" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="PodTopologySpread" node="harvester-node-2" score=200

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="TaintToleration" node="harvester-node-0" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="TaintToleration" node="harvester-node-1" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="TaintToleration" node="harvester-node-2" score=100

"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeResourcesBalancedAllocation" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeResourcesBalancedAllocation" node="harvester-node-1" score=45
"Plugin scored node for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" plugin="NodeResourcesBalancedAllocation" node="harvester-node-2" score=46

"Calculated node's final score for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" node="harvester-node-0" score=1000358
"Calculated node's final score for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" node="harvester-node-1" score=1000433
"Calculated node's final score for pod" pod="default/virt-launcher-vm-without-overcommit-75q9b" node="harvester-node-2" score=1000437

AssumePodVolumes for pod "default/virt-launcher-vm-without-overcommit-75q9b", node "harvester-node-2"
AssumePodVolumes for pod "default/virt-launcher-vm-without-overcommit-75q9b", node "harvester-node-2": all PVCs bound and nothing to do
"Attempting to bind pod to node" pod="default/virt-launcher-vm-without-overcommit-75q9b" node="harvester-node-2"

Table 1 - kube-scheduler scores example

harvester-node-0harvester-node-1harvester-node-2
ImageLocality545454
InterPodAffinity000
NodeResourcesLeastAllocated43437
NodeAffinity000
NodePreferAvoidPods100000010000001000000
PodTopologySpread200200200
TaintToleration100100100
NodeResourcesBalancedAllocation04546
Total100035810004331000437

Why VMs are distributed unevenly with overcommit?

With resources overcommit, Harvester modifies the resources request. By default, the overcommit configuration is {"cpu": 1600, "memory": 150, "storage": 200}. This means that if we request a VM with 1 CPU and 1G RAM, its resources.requests.cpu will become 62m.

!!! note The unit suffix m stands for "thousandth of a core."

To explain it, let's take the case of CPU overcommit. The default value of 1 CPU is equal to 1000m CPU, and with the default overcommit configuration of "cpu": 1600, the CPU resource will be 16x smaller. Here is the calculation: 1000m * 100 / 1600 = 62m.

Now, we can see how overcommitting influences kube-scheduler scores.

In this example, we use a three nodes Harvester cluster with 6 cores and 16G RAM each. We will deploy two VMs with 1 CPU and 1G RAM, and we will compare the scores for both cases of "with-overcommit" and "without-overcommit" resources.

The results of both tables Table 2 and Table 3 can be explained as follow:

In the "with-overcommit" case, both VMs are deployed on harvester-node-2, however in the "without-overcommit" case, the VM1 is deployed on harvester-node-2, and VM2 is deployed on harvester-node-1.

If we look at the detailed scores, we'll see a variation of Total Score for harvester-node-2 from 1000459 to 1000461 in the "with-overcommit" case, and 1000437 to 1000382 in the "without-overcommit case". It's because resources overcommit influences request-cpu and request-memory.

In the "with-overcommit" case, the request-cpu changes from 4412m to 4474m. The difference between the two numbers is 62m, which is what we calculated above. However, in the "without-overcommit" case, we send real requests to kube-scheduler, so the request-cpu changes from 5350m to 6350m.

Finally, since most plugins give the same scores for each node except NodeResourcesBalancedAllocation and NodeResourcesLeastAllocated, we'll see a difference of these two scores for each node.

From the results, we can see the overcommit feature influences the final score of each Node, so VMs are distributed unevenly. Although the harvester-node-2 score for VM 2 is higher than VM 1, it's not always increasing. In Table 4, we keep deploying VM with 1 CPU and 1G RAM, and we can see the score of harvester-node-2 starts decreasing from 11th VM. The behavior of kube-scheduler depends on your cluster resources and the workload you deployed.

kube-scheduler logs for vm1-with-overcommit
virt-launcher-vm1-with-overcommit-ljlmq -> harvester-node-0: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9022 memory:14807289856] ,score 0,
virt-launcher-vm1-with-overcommit-ljlmq -> harvester-node-1: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4622 memory:5992960000] ,score 58,
virt-launcher-vm1-with-overcommit-ljlmq -> harvester-node-2: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4412 memory:5581918208] ,score 59,

virt-launcher-vm1-with-overcommit-ljlmq -> harvester-node-0: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9022 memory:14807289856] ,score 5,
virt-launcher-vm1-with-overcommit-ljlmq -> harvester-node-1: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4622 memory:5992960000] ,score 43,
virt-launcher-vm1-with-overcommit-ljlmq -> harvester-node-2: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4412 memory:5581918208] ,score 46,

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="InterPodAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="InterPodAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="InterPodAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeResourcesLeastAllocated" node="harvester-node-0" score=5
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeResourcesLeastAllocated" node="harvester-node-1" score=43
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeResourcesLeastAllocated" node="harvester-node-2" score=46

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodePreferAvoidPods" node="harvester-node-0" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodePreferAvoidPods" node="harvester-node-1" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodePreferAvoidPods" node="harvester-node-2" score=1000000

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="PodTopologySpread" node="harvester-node-0" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="PodTopologySpread" node="harvester-node-1" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="PodTopologySpread" node="harvester-node-2" score=200

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="TaintToleration" node="harvester-node-0" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="TaintToleration" node="harvester-node-1" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="TaintToleration" node="harvester-node-2" score=100

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeResourcesBalancedAllocation" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeResourcesBalancedAllocation" node="harvester-node-1" score=58
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="NodeResourcesBalancedAllocation" node="harvester-node-2" score=59

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="ImageLocality" node="harvester-node-0" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="ImageLocality" node="harvester-node-1" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" plugin="ImageLocality" node="harvester-node-2" score=54

"Calculated node's final score for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" node="harvester-node-0" score=1000359
"Calculated node's final score for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" node="harvester-node-1" score=1000455
"Calculated node's final score for pod" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" node="harvester-node-2" score=1000459

AssumePodVolumes for pod "default/virt-launcher-vm1-with-overcommit-ljlmq", node "harvester-node-2"
AssumePodVolumes for pod "default/virt-launcher-vm1-with-overcommit-ljlmq", node "harvester-node-2": all PVCs bound and nothing to do
"Attempting to bind pod to node" pod="default/virt-launcher-vm1-with-overcommit-ljlmq" node="harvester-node-2"
kube-scheduler logs for vm2-with-overcommit
virt-launcher-vm2-with-overcommit-pwrx4 -> harvester-node-0: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9022 memory:14807289856] ,score 0,
virt-launcher-vm2-with-overcommit-pwrx4 -> harvester-node-1: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4622 memory:5992960000] ,score 58,
virt-launcher-vm2-with-overcommit-pwrx4 -> harvester-node-2: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4474 memory:6476701696] ,score 64,

virt-launcher-vm2-with-overcommit-pwrx4 -> harvester-node-0: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9022 memory:14807289856] ,score 5,
virt-launcher-vm2-with-overcommit-pwrx4 -> harvester-node-1: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4622 memory:5992960000] ,score 43,
virt-launcher-vm2-with-overcommit-pwrx4 -> harvester-node-2: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:4474 memory:6476701696] ,score 43,

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodePreferAvoidPods" node="harvester-node-0" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodePreferAvoidPods" node="harvester-node-1" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodePreferAvoidPods" node="harvester-node-2" score=1000000

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="PodTopologySpread" node="harvester-node-0" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="PodTopologySpread" node="harvester-node-1" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="PodTopologySpread" node="harvester-node-2" score=200

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="TaintToleration" node="harvester-node-0" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="TaintToleration" node="harvester-node-1" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="TaintToleration" node="harvester-node-2" score=100

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeResourcesBalancedAllocation" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeResourcesBalancedAllocation" node="harvester-node-1" score=58
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeResourcesBalancedAllocation" node="harvester-node-2" score=64

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="ImageLocality" node="harvester-node-0" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="ImageLocality" node="harvester-node-1" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="ImageLocality" node="harvester-node-2" score=54

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="InterPodAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="InterPodAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="InterPodAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeResourcesLeastAllocated" node="harvester-node-0" score=5
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeResourcesLeastAllocated" node="harvester-node-1" score=43
"Plugin scored node for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" plugin="NodeResourcesLeastAllocated" node="harvester-node-2" score=43

"Calculated node's final score for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" node="harvester-node-0" score=1000359
"Calculated node's final score for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" node="harvester-node-1" score=1000455
"Calculated node's final score for pod" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" node="harvester-node-2" score=1000461

AssumePodVolumes for pod "default/virt-launcher-vm2-with-overcommit-pwrx4", node "harvester-node-2"
AssumePodVolumes for pod "default/virt-launcher-vm2-with-overcommit-pwrx4", node "harvester-node-2": all PVCs bound and nothing to do
"Attempting to bind pod to node" pod="default/virt-launcher-vm2-with-overcommit-pwrx4" node="harvester-node-2"
kube-scheduler logs for vm1-without-overcommit
virt-launcher-vm1-with-overcommit-6xqmq -> harvester-node-0: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9960 memory:15166603264] ,score 0,
virt-launcher-vm1-with-overcommit-6xqmq -> harvester-node-1: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5560 memory:6352273408] ,score 45,
virt-launcher-vm1-with-overcommit-6xqmq -> harvester-node-2: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5350 memory:5941231616] ,score 46,

virt-launcher-vm1-with-overcommit-6xqmq -> harvester-node-0: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9960 memory:15166603264] ,score 4,
virt-launcher-vm1-with-overcommit-6xqmq -> harvester-node-1: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5560 memory:6352273408] ,score 34,
virt-launcher-vm1-with-overcommit-6xqmq -> harvester-node-2: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5350 memory:5941231616] ,score 37,

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="InterPodAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="InterPodAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="InterPodAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeResourcesLeastAllocated" node="harvester-node-0" score=4
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeResourcesLeastAllocated" node="harvester-node-1" score=34
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeResourcesLeastAllocated" node="harvester-node-2" score=37

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodePreferAvoidPods" node="harvester-node-0" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodePreferAvoidPods" node="harvester-node-1" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodePreferAvoidPods" node="harvester-node-2" score=1000000

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="PodTopologySpread" node="harvester-node-0" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="PodTopologySpread" node="harvester-node-1" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="PodTopologySpread" node="harvester-node-2" score=200

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="TaintToleration" node="harvester-node-0" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="TaintToleration" node="harvester-node-1" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="TaintToleration" node="harvester-node-2" score=100

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeResourcesBalancedAllocation" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeResourcesBalancedAllocation" node="harvester-node-1" score=45
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="NodeResourcesBalancedAllocation" node="harvester-node-2" score=46

"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="ImageLocality" node="harvester-node-0" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="ImageLocality" node="harvester-node-1" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" plugin="ImageLocality" node="harvester-node-2" score=54

"Calculated node's final score for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" node="harvester-node-0" score=1000358
"Calculated node's final score for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" node="harvester-node-1" score=1000433
"Calculated node's final score for pod" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" node="harvester-node-2" score=1000437

AssumePodVolumes for pod "default/virt-launcher-vm1-with-overcommit-6xqmq", node "harvester-node-2"
AssumePodVolumes for pod "default/virt-launcher-vm1-with-overcommit-6xqmq", node "harvester-node-2": all PVCs bound and nothing to do
"Attempting to bind pod to node" pod="default/virt-launcher-vm1-with-overcommit-6xqmq" node="harvester-node-2"
kube-scheduler logs for vm2-without-overcommit
virt-launcher-vm2-without-overcommit-mf5vk -> harvester-node-0: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9960 memory:15166603264] ,score 0,
virt-launcher-vm2-without-overcommit-mf5vk -> harvester-node-1: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5560 memory:6352273408] ,score 45,
virt-launcher-vm2-without-overcommit-mf5vk -> harvester-node-2: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:6350 memory:7195328512] ,score 0,

virt-launcher-vm2-without-overcommit-mf5vk -> harvester-node-0: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:9960 memory:15166603264] ,score 4,
virt-launcher-vm2-without-overcommit-mf5vk -> harvester-node-1: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:5560 memory:6352273408] ,score 34,
virt-launcher-vm2-without-overcommit-mf5vk -> harvester-node-2: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:6000 memory:16776437760], map of requested resources map[cpu:6350 memory:7195328512] ,score 28,

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="PodTopologySpread" node="harvester-node-0" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="PodTopologySpread" node="harvester-node-1" score=200
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="PodTopologySpread" node="harvester-node-2" score=200

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="TaintToleration" node="harvester-node-0" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="TaintToleration" node="harvester-node-1" score=100
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="TaintToleration" node="harvester-node-2" score=100

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeResourcesBalancedAllocation" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeResourcesBalancedAllocation" node="harvester-node-1" score=45
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeResourcesBalancedAllocation" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="ImageLocality" node="harvester-node-0" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="ImageLocality" node="harvester-node-1" score=54
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="ImageLocality" node="harvester-node-2" score=54

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="InterPodAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="InterPodAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="InterPodAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeResourcesLeastAllocated" node="harvester-node-0" score=4
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeResourcesLeastAllocated" node="harvester-node-1" score=34
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeResourcesLeastAllocated" node="harvester-node-2" score=28

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeAffinity" node="harvester-node-0" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeAffinity" node="harvester-node-1" score=0
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodeAffinity" node="harvester-node-2" score=0

"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodePreferAvoidPods" node="harvester-node-0" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodePreferAvoidPods" node="harvester-node-1" score=1000000
"Plugin scored node for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" plugin="NodePreferAvoidPods" node="harvester-node-2" score=1000000

"Calculated node's final score for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" node="harvester-node-0" score=1000358
"Calculated node's final score for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" node="harvester-node-1" score=1000433
"Calculated node's final score for pod" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" node="harvester-node-2" score=1000382

AssumePodVolumes for pod "default/virt-launcher-vm2-without-overcommit-mf5vk", node "harvester-node-1"
AssumePodVolumes for pod "default/virt-launcher-vm2-without-overcommit-mf5vk", node "harvester-node-1": all PVCs bound and nothing to do
"Attempting to bind pod to node" pod="default/virt-launcher-vm2-without-overcommit-mf5vk" node="harvester-node-1"

Table 2 - With Overcommit

VM 1 / VM 2harvester-node-0harvester-node-1harvester-node-2
request-cpu (m)9022 / 90224622 / 46224412 / 4474
request-memory14807289856 / 148072898565992960000 / 59929600005581918208 / 6476701696
NodeResourcesBalancedAllocation Score0 / 058 / 5859 / 64
NodeResourcesLeastAllocated Score5 / 543 / 4346 / 43
Other Scores1000354 / 10003541000354 / 10003541000354 / 1000354
Total Score1000359 / 10003591000455 / 10004551000459 / 1000461

Table 3 - Without Overcommit

VM 1 / VM 2harvester-node-0harvester-node-1harvester-node-2
request-cpu (m)9960 / 99605560 / 55605350 / 6350
request-memory15166603264 / 151666032646352273408 / 63522734085941231616 / 7195328512
NodeResourcesBalancedAllocation Score0 / 045 / 4546 / 0
NodeResourcesLeastAllocated Score4 / 434 / 3437 / 28
Other Scores1000354 / 10003541000354 / 10003541000354 / 1000354
Total Score1000358 / 10003581000358 / 10004331000437 / 1000382

Table 4

Scoreharvester-node-0harvester-node-1harvester-node-2
VM 1100035910004551000459
VM 2100035910004551000461
VM 3100035910004551000462
VM 4100035910004551000462
VM 5100035910004551000463
VM 6100035910004551000465
VM 7100035910004551000466
VM 8100035910004551000467
VM 9100035910004551000469
VM 10100035910004551000469
VM 11100035910004551000465
VM 12100035910004551000457

How to avoid uneven distribution of VMs?

There are many plugins in kube-scheduler which we can use to influence the scores. For example, we can add the podAntiAffinity plugin to avoid VMs with the same labels being deployed on the same node.

  affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: harvesterhci.io/creator
operator: Exists
topologyKey: kubernetes.io/hostname
weight: 100

How to see scores in kube-scheduler?

kube-scheduler is deployed as a static pod in Harvester. The file is under /var/lib/rancher/rke2/agent/pod-manifests/kube-scheduler.yaml in each Management Node. We can add - --v=10 to the kube-scheduler container to show score logs.

kind: Pod
metadata:
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
# ...
- --v=10