Kubernetes资源管理与调度策略
引言
Kubernetes已经成为容器编排的标准平台,掌握Kubernetes的资源管理和调度策略对于构建稳定、高效的容器化应用至关重要。本文将深入探讨Kubernetes的资源管理机制和调度策略。
一、Kubernetes资源模型
1.1 资源类型
Kubernetes支持两种类型的资源:
- 可压缩资源(Compressible):CPU,可被压缩使用
- 不可压缩资源(Incompressible):内存,不可被压缩
1.2 资源请求与限制
apiVersion: v1 kind: Pod metadata: name: resource-demo spec: containers: - name: demo image: nginx:alpine resources: requests: cpu: "100m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi"1.3 QoS等级
Kubernetes根据资源配置将Pod分为三个QoS等级:
| QoS等级 | 条件 | 特点 |
|---|---|---|
| Guaranteed | requests == limits | 资源完全保证,优先级最高 |
| Burstable | requests < limits | 可突发使用资源,优先级中等 |
| BestEffort | 无requests和limits | 资源尽力分配,优先级最低 |
二、节点选择与调度
2.1 节点选择器
apiVersion: v1 kind: Pod metadata: name: node-selector-demo spec: nodeSelector: disktype: ssd zone: us-east-1a containers: - name: demo image: nginx:alpine2.2 节点亲和性
apiVersion: v1 kind: Pod metadata: name: affinity-demo spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - us-east-1a - us-east-1b preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disktype operator: In values: - ssd containers: - name: demo image: nginx:alpine2.3 Pod亲和性与反亲和性
apiVersion: v1 kind: Pod metadata: name: pod-affinity-demo spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - backend topologyKey: kubernetes.io/hostname podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - database topologyKey: kubernetes.io/hostname containers: - name: demo image: nginx:alpine三、调度策略
3.1 默认调度器
Kubernetes默认调度器使用以下步骤进行调度:
- 过滤阶段:排除不满足条件的节点
- 评分阶段:对剩余节点评分
- 选择阶段:选择评分最高的节点
3.2 调度器配置
apiVersion: kubescheduler.config.k8s.io/v1beta3 kind: KubeSchedulerConfiguration profiles: - schedulerName: default-scheduler plugins: score: enabled: - name: NodeResourcesLeastAllocated weight: 1 - name: NodeResourcesMostAllocated weight: 2 disabled: - name: NodeResourcesBalancedAllocation3.3 自定义调度器
package main import ( "context" "k8s.io/kubernetes/pkg/scheduler/framework" "k8s.io/kubernetes/pkg/scheduler/framework/plugins/noderesources" ) type CustomScheduler struct { framework.Framework } func (s *CustomScheduler) Schedule(ctx context.Context, state *framework.CycleState, pod *v1.Pod) (result framework.ScheduleResult, err error) { // 自定义调度逻辑 return result, nil } func main() { framework.RegisterPlugin("custom-scorer", noderesources.NewLeastAllocated) scheduler := &CustomScheduler{} scheduler.Run() }四、资源配额与限制范围
4.1 ResourceQuota
apiVersion: v1 kind: ResourceQuota metadata: name: namespace-quota spec: hard: requests.cpu: "4" requests.memory: "8Gi" limits.cpu: "8" limits.memory: "16Gi" pods: "10" services: "5" configmaps: "10"4.2 LimitRange
apiVersion: v1 kind: LimitRange metadata: name: container-limits spec: limits: - type: Container default: cpu: "500m" memory: "512Mi" defaultRequest: cpu: "100m" memory: "256Mi" max: cpu: "2" memory: "2Gi" min: cpu: "50m" memory: "128Mi" - type: Pod max: cpu: "4" memory: "4Gi" min: cpu: "100m" memory: "256Mi"五、Horizontal Pod Autoscaler
5.1 HPA配置
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: hpa-demo spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: demo-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 805.2 自定义指标
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: hpa-custom-metrics spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: demo-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: requests-per-second target: type: AverageValue averageValue: 100m六、Vertical Pod Autoscaler
6.1 VPA配置
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: vpa-demo spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: demo-deployment updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: "*" minAllowed: cpu: "50m" memory: "128Mi" maxAllowed: cpu: "2" memory: "2Gi"七、调度器插件
7.1 启用插件
apiVersion: kubescheduler.config.k8s.io/v1beta3 kind: KubeSchedulerConfiguration profiles: - schedulerName: default-scheduler plugins: filter: enabled: - name: NodeResourcesFit - name: NodeAffinity - name: PodAffinity score: enabled: - name: NodeResourcesLeastAllocated weight: 1 - name: ImageLocality weight: 27.2 常用插件
| 插件 | 类型 | 作用 |
|---|---|---|
| NodeResourcesFit | Filter | 检查节点资源是否满足Pod需求 |
| NodeAffinity | Filter/Score | 节点亲和性 |
| PodAffinity | Filter/Score | Pod亲和性 |
| NodeResourcesLeastAllocated | Score | 优先选择资源使用最少的节点 |
| NodeResourcesMostAllocated | Score | 优先选择资源使用最多的节点 |
| ImageLocality | Score | 优先选择已有镜像的节点 |
八、污点与容忍度
8.1 节点污点
# 添加污点 kubectl taint nodes node-1 key=value:NoSchedule # 查看污点 kubectl get nodes node-1 -o jsonpath='{.spec.taints}' # 删除污点 kubectl taint nodes node-1 key-8.2 Pod容忍度
apiVersion: v1 kind: Pod metadata: name: toleration-demo spec: tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule" - key: "node.kubernetes.io/unreachable" operator: "Exists" tolerationSeconds: 6000 containers: - name: demo image: nginx:alpine8.3 污点效果
| 效果 | 说明 |
|---|---|
| NoSchedule | 不调度到该节点,除非Pod有容忍度 |
| PreferNoSchedule | 尽量不调度到该节点 |
| NoExecute | 不调度到该节点,且驱逐已存在的Pod |
九、实战案例:资源优化
9.1 配置优化
apiVersion: apps/v1 kind: Deployment metadata: name: optimized-app spec: replicas: 3 selector: matchLabels: app: optimized-app template: metadata: labels: app: optimized-app spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: optimized-app topologyKey: kubernetes.io/hostname containers: - name: app image: myapp:latest resources: requests: cpu: "200m" memory: "512Mi" limits: cpu: "1" memory: "1Gi" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 5 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 39.2 监控与调整
# 查看节点资源使用 kubectl top nodes # 查看Pod资源使用 kubectl top pods # 查看事件 kubectl get events # 查看调度器日志 kubectl logs -n kube-system kube-scheduler-<node-name>十、总结
Kubernetes的资源管理和调度策略是构建高效、稳定容器化应用的关键。通过合理配置资源请求和限制、使用节点选择器和亲和性、配置HPA和VPA自动扩缩容,可以实现资源的最优利用。
在实际项目中,需要根据应用特点和集群状况不断调整调度策略,实现最佳的资源利用率和应用性能。持续监控和优化是Kubernetes运维的重要组成部分。