-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Which component are you using?:
/area cluster-autoscaler
What version of the component are you using?:
Component version: 1.34
What k8s version are you using (kubectl version)?: 1.34
What environment is this in?:
Self-managed Kubernetes on Azure, using ARM64 VM SKUs (e.g.
Standard_Dps_v5, Standard_Eps_v5 — Ampere Altra ARM64 VMs) with VMSS node groups.
What did you expect to happen?:
Cluster Autoscaler should label node templates with kubernetes.io/arch=arm64 for ARM64
VM SKUs, so that the scheduler simulation correctly evaluates NodeAffinity predicates
for pods requesting kubernetes.io/arch: arm64, and triggers scale-up on the appropriate
node group.
What happened instead?:
buildGenericLabels() in azure_template.go hardcodes kubernetes.io/arch=amd64 and
beta.kubernetes.io/arch=amd64 on all node templates regardless of the actual VM
architecture:
autoscaler/cluster-autoscaler/cloudprovider/azure/azure_template.go
Lines 403 to 404 in 3015658
| result[kubeletapis.LabelArch] = cloudprovider.DefaultArch | |
| result[apiv1.LabelArchStable] = cloudprovider.DefaultArch |
As a result, pods with nodeAffinity requiring kubernetes.io/arch: arm64 receive
NotTriggerScaleUp — CA's scheduler simulation fails the NodeAffinity predicate because
the template for an ARM64 node group incorrectly reports amd64.
How to reproduce it (as minimally and precisely as possible):
- Create an Azure VMSS node group using an ARM64 SKU (e.g. Standard_D32ps_v5).
- Deploy a pod with the following node affinity:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:- matchExpressions:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: kubernetes.io/arch
- matchExpressions:
- Ensure the VMSS is at 0. CA will not scale up the ARM64 node
group. - Check CA logs — the node template for the ARM64 VMSS will show
kubernetes.io/arch:amd64.
Anything else we need to know?: