Orleans 在 Kubernetes 上的部署配置
Orleans 在 Kubernetes 上的部署配置与源码机制说明
本文基于源码和官方文档梳理在 Kubernetes 上托管 Orleans 的正确姿势,包含:
- 配置点与约束
- 关键源码行为与引用
- 示例应用代码与 Kubernetes YAML
- 启动与运行期的时序图
- 常见问题与排查
参考文档:
- Kubernetes hosting(官方文档): learn.microsoft.com - Orleans Kubernetes hosting
一、核心概念与约束
- 使用
Microsoft.Orleans.Hosting.Kubernetes
增强在 Kubernetes 的托管体验,通过UseKubernetesHosting()
完成:- 将
SiloOptions.SiloName
设为 Pod 名称 - 将
EndpointOptions.AdvertisedIPAddress
设为 Pod IP(或由 PodName 解析) - 将
EndpointOptions.SiloListeningEndpoint
/GatewayListeningEndpoint
绑定到 Any 地址,端口默认11111 / 30000
- 从 Pod 标签/环境变量设置
ClusterOptions.ServiceId
与ClusterOptions.ClusterId
- 启动期:探测 K8s 中不再存在的 Pod 与 Orleans 成员差异,标记失配 Silo 为 Dead
- 运行期:集群内仅选取少量 Silo(默认 2 个)作为“观察者”监视 K8s 事件,减少 API Server 压力
- 将
- 注意:Kubernetes 托管不等于 Orleans 集群成员管理(Clustering Provider 仍需单独配置,如 Azure Storage/ADO.NET/Consul 等)
- 必要标签与环境变量:
- Pod 标签:
orleans/serviceId
、orleans/clusterId
- 环境变量:
POD_NAME
、POD_NAMESPACE
、POD_IP
、ORLEANS_SERVICE_ID
、ORLEANS_CLUSTER_ID
- Pod 标签:
二、关键源码位置与行为
- 托管扩展注册与默认配置(添加
ConfigureKubernetesHostingOptions
与KubernetesClusterAgent
)
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
using Orleans.Configuration;
using Orleans.Hosting.Kubernetes;
using Orleans.Runtime;
using System;namespace Orleans.Hosting
{/// <summary>/// Extensions for hosting a silo in Kubernetes./// </summary>public static class KubernetesHostingExtensions{/// <summary>/// Adds Kubernetes hosting support./// </summary>public static ISiloBuilder UseKubernetesHosting(this ISiloBuilder siloBuilder){return siloBuilder.ConfigureServices(services => services.UseKubernetesHosting(configureOptions: null));}/// <summary>/// Adds Kubernetes hosting support./// </summary>public static ISiloBuilder UseKubernetesHosting(this ISiloBuilder siloBuilder, Action<OptionsBuilder<KubernetesHostingOptions>> configureOptions){return siloBuilder.ConfigureServices(services => services.UseKubernetesHosting(configureOptions));}/// <summary>/// Adds Kubernetes hosting support./// </summary>public static IServiceCollection UseKubernetesHosting(this IServiceCollection services) => services.UseKubernetesHosting(configureOptions: null);/// <summary>/// Adds Kubernetes hosting support./// </summary>public static IServiceCollection UseKubernetesHosting(this IServiceCollection services, Action<OptionsBuilder<KubernetesHostingOptions>> configureOptions){configureOptions?.Invoke(services.AddOptions<KubernetesHostingOptions>());// Configure defaults based on the current environment.services.AddSingleton<IConfigureOptions<ClusterOptions>, ConfigureKubernetesHostingOptions>();services.AddSingleton<IConfigureOptions<SiloOptions>, ConfigureKubernetesHostingOptions>();services.AddSingleton<IPostConfigureOptions<EndpointOptions>, ConfigureKubernetesHostingOptions>();services.AddSingleton<IConfigureOptions<KubernetesHostingOptions>, ConfigureKubernetesHostingOptions>();services.AddSingleton<IValidateOptions<KubernetesHostingOptions>, KubernetesHostingOptionsValidator>();services.AddSingleton<ILifecycleParticipant<ISiloLifecycle>, KubernetesClusterAgent>();return services;}}
}
- 环境变量/标签映射与端点配置(将
POD_*
映射到SiloOptions
/EndpointOptions
,将ORLEANS_*
映射到ClusterOptions
)
#nullable enable
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
using Orleans.Configuration;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;namespace Orleans.Hosting.Kubernetes
{internal class ConfigureKubernetesHostingOptions :IConfigureOptions<ClusterOptions>,IConfigureOptions<SiloOptions>,IPostConfigureOptions<EndpointOptions>,IConfigureOptions<KubernetesHostingOptions>{private readonly IServiceProvider _serviceProvider;public ConfigureKubernetesHostingOptions(IServiceProvider serviceProvider){_serviceProvider = serviceProvider;}public void Configure(KubernetesHostingOptions options){options.Namespace ??= Environment.GetEnvironmentVariable(KubernetesHostingOptions.PodNamespaceEnvironmentVariable) ?? ReadNamespaceFromServiceAccount();options.PodName ??= Environment.GetEnvironmentVariable(KubernetesHostingOptions.PodNameEnvironmentVariable) ?? Environment.MachineName;options.PodIP ??= Environment.GetEnvironmentVariable(KubernetesHostingOptions.PodIPEnvironmentVariable);}public void Configure(ClusterOptions options){var serviceIdEnvVar = Environment.GetEnvironmentVariable(KubernetesHostingOptions.ServiceIdEnvironmentVariable);if (!string.IsNullOrWhiteSpace(serviceIdEnvVar)){options.ServiceId = serviceIdEnvVar;}var clusterIdEnvVar = Environment.GetEnvironmentVariable(KubernetesHostingOptions.ClusterIdEnvironmentVariable);if (!string.IsNullOrWhiteSpace(clusterIdEnvVar)){options.ClusterId = clusterIdEnvVar;}}public void Configure(SiloOptions options){var hostingOptions = _serviceProvider.GetRequiredService<IOptions<KubernetesHostingOptions>>().Value;if (!string.IsNullOrWhiteSpace(hostingOptions.PodName)){options.SiloName = hostingOptions.PodName;}}public void PostConfigure(string? name, EndpointOptions options){// Use PostConfigure to give the developer an opportunity to set SiloPort and GatewayPort using regular// Configure methods without needing to worry about ordering with respect to the UseKubernetesHosting call.if (options.AdvertisedIPAddress is null){var hostingOptions = _serviceProvider.GetRequiredService<IOptions<KubernetesHostingOptions>>().Value;IPAddress? podIp = null;if (hostingOptions.PodIP is not null){podIp = IPAddress.Parse(hostingOptions.PodIP);}else{var hostAddresses = Dns.GetHostAddresses(hostingOptions.PodName);if (hostAddresses != null){podIp = IPAddressSelector.PickIPAddress(hostAddresses);}}if (podIp is not null){options.AdvertisedIPAddress = podIp;}}if (options.SiloListeningEndpoint is null){options.SiloListeningEndpoint = new IPEndPoint(IPAddress.Any, options.SiloPort);}if (options.GatewayListeningEndpoint is null && options.GatewayPort > 0){options.GatewayListeningEndpoint = new IPEndPoint(IPAddress.Any, options.GatewayPort);}}private string? ReadNamespaceFromServiceAccount(){// Read the namespace from the pod's service account.
- 常量:环境变量与标签名(确保 YAML 和应用一致)
using k8s;
using System;namespace Orleans.Hosting.Kubernetes
{/// <summary>/// Options for hosting in Kubernetes./// </summary>public sealed class KubernetesHostingOptions{private readonly Lazy<KubernetesClientConfiguration> _clientConfiguration;/// <summary>/// The environment variable for specifying the Kubernetes namespace which all silos in this cluster belong to./// </summary>public const string PodNamespaceEnvironmentVariable = "POD_NAMESPACE";/// <summary>/// The environment variable for specifying the name of the Kubernetes pod which this silo is executing in./// </summary>public const string PodNameEnvironmentVariable = "POD_NAME";/// <summary>/// The environment variable for specifying the IP address of this pod./// </summary>public const string PodIPEnvironmentVariable = "POD_IP";/// <summary>/// The environment variable for specifying <see cref="Orleans.Configuration.ClusterOptions.ClusterId"/>./// </summary>public const string ClusterIdEnvironmentVariable = "ORLEANS_CLUSTER_ID";/// <summary>/// The environment variable for specifying <see cref="Orleans.Configuration.ClusterOptions.ServiceId"/>./// </summary>public const string ServiceIdEnvironmentVariable = "ORLEANS_SERVICE_ID";/// <summary>/// The name of the <see cref="Orleans.Configuration.ClusterOptions.ServiceId"/> label on the pod./// </summary>public const string ServiceIdLabel = "orleans/serviceId";/// <summary>/// The name of the <see cref="Orleans.Configuration.ClusterOptions.ClusterId"/> label on the pod./// </summary>public const string ClusterIdLabel = "orleans/clusterId";public KubernetesHostingOptions(){_clientConfiguration = new Lazy<KubernetesClientConfiguration>(() => this.GetClientConfiguration());
- 代理:启动期“对齐”与运行期“观察/标记/删除”
- 启动时:写回本 Pod 标签的 ServiceId/ClusterId,列举同标签 Pods,与 Orleans 成员对比,未匹配的活跃 Silo 标记为 Dead
- 运行时:选择 N 个活跃 Silo 作为 watcher(默认 2),监听 Pod 删除事件并将对应 Silo 标记为 Dead;可选地删除失效 Silo 对应 Pod(配置控制)
private async Task OnStart(CancellationToken cancellation){var attempts = 0;while (!cancellation.IsCancellationRequested){try{await AddClusterOptionsToPodLabels(cancellation);// Find the currently known cluster members first, before interrogating Kubernetesawait _clusterMembershipService.Refresh();var snapshot = _clusterMembershipService.CurrentSnapshot.Members;// Find the pods which correspond to this clustervar pods = await _client.ListNamespacedPodAsync(namespaceParameter: _podNamespace,labelSelector: _podLabelSelector,cancellationToken: cancellation);var clusterPods = new HashSet<string> { _podName };foreach (var pod in pods.Items){clusterPods.Add(pod.Metadata.Name);}var known = new HashSet<string>();var knownMap = new Dictionary<string, ClusterMember>();known.Add(_podName);foreach (var member in snapshot.Values){if (member.Status == SiloStatus.Dead){continue;}known.Add(member.Name);knownMap[member.Name] = member;}var unknownPods = new List<string>(clusterPods.Except(known));unknownPods.Sort();foreach (var pod in unknownPods){_logger.LogWarning("Pod {PodName} does not correspond to any known silos", pod);// Delete the pod once it has been active long enough?}var unmatched = new List<string>(known.Except(clusterPods));unmatched.Sort();foreach (var pod in unmatched){var siloAddress = knownMap[pod];if (siloAddress.Status is not SiloStatus.Active){continue;}_logger.LogWarning("Silo {SiloAddress} does not correspond to any known pod. Marking it as dead.", siloAddress);await _clusterMembershipService.TryKill(siloAddress.SiloAddress);}break;}catch (HttpOperationException exception) when (exception.Response.StatusCode is System.Net.HttpStatusCode.Forbidden){_logger.LogError(exception, $"Unable to monitor pods due to insufficient permissions. Ensure that this pod has an appropriate Kubernetes role binding. Here is an example role binding:\n{ExampleRoleBinding}");}catch (Exception exception){_logger.LogError(exception, "Error while initializing Kubernetes cluster agent");if (++attempts > _options.CurrentValue.MaxKubernetesApiRetryAttempts){throw;}await Task.Delay(1000, cancellation);}}// Start monitoring loopThreadPool.UnsafeQueueUserWorkItem(_ => _runTask = Task.WhenAll(Task.Run(MonitorOrleansClustering), Task.Run(MonitorKubernetesPods)), null);}
private async Task MonitorOrleansClustering(){var previous = _clusterMembershipService.CurrentSnapshot;while (!_shutdownToken.IsCancellationRequested){try{await foreach (var update in _clusterMembershipService.MembershipUpdates.WithCancellation(_shutdownToken.Token)){// Determine which silos should be monitoring Kubernetesvar chosenSilos = _clusterMembershipService.CurrentSnapshot.Members.Values.Where(s => s.Status == SiloStatus.Active).OrderBy(s => s.SiloAddress).Take(_options.CurrentValue.MaxAgents).ToList();if (!_enableMonitoring && chosenSilos.Any(s => s.SiloAddress.Equals(_localSiloDetails.SiloAddress))){_enableMonitoring = true;_pauseMonitoringSemaphore.Release(1);}else if (_enableMonitoring){_enableMonitoring = false;}if (_enableMonitoring && _options.CurrentValue.DeleteDefunctSiloPods){var delta = update.CreateUpdate(previous);foreach (var change in delta.Changes){if (change.SiloAddress.Equals(_localSiloDetails.SiloAddress)){// Ignore all changes for this silocontinue;}if (change.Status == SiloStatus.Dead){try{if (_logger.IsEnabled(LogLevel.Information)){_logger.LogInformation("Silo {SiloAddress} is dead, proceeding to delete the corresponding pod, {PodName}, in namespace {PodNamespace}", change.SiloAddress, change.Name, _podNamespace);}await _client.DeleteNamespacedPodAsync(change.Name, _podNamespace);}catch (Exception exception){_logger.LogError(exception, "Error deleting pod {PodName} in namespace {PodNamespace} corresponding to defunct silo {SiloAddress}", change.Name, _podNamespace, change.SiloAddress);}}}}previous = update;}}catch (Exception exception) when (!(_shutdownToken.IsCancellationRequested && (exception is TaskCanceledException || exception is OperationCanceledException))){if (_logger.IsEnabled(LogLevel.Debug))
await foreach (var (eventType, pod) in pods.WatchAsync<V1PodList, V1Pod>(_shutdownToken.Token)){if (!_enableMonitoring || _shutdownToken.IsCancellationRequested){break;}if (string.Equals(pod.Metadata.Name, _podName, StringComparison.Ordinal)){// Never declare ourselves dead this way.continue;}if (eventType == WatchEventType.Modified){// TODO: Remember silo addresses for pods that are restarting/terminating}if (eventType == WatchEventType.Deleted){if (this.TryMatchSilo(pod, out var member) && member.Status != SiloStatus.Dead){if (_logger.IsEnabled(LogLevel.Information)){_logger.LogInformation("Declaring server {Silo} dead since its corresponding pod, {Pod}, has been deleted", member.SiloAddress, pod.Metadata.Name);}await _clusterMembershipService.TryKill(member.SiloAddress);}}}
三、应用最小化示例(C#)
var builder = Host.CreateDefaultBuilder(args).UseOrleans(silo =>{// 启用 Kubernetes 托管(核心)silo.UseKubernetesHosting();// 必须选择一个 Clustering Provider(示例:Azure Storage)silo.UseAzureStorageClustering(options =>{options.ConnectionString = Environment.GetEnvironmentVariable("STORAGE_CONNECTION_STRING");});// 端口(可选;缺省为 11111 / 30000)silo.Configure<EndpointOptions>(opt =>{opt.SiloPort = 11111;opt.GatewayPort = 30000;});});await builder.RunConsoleAsync();
四、Kubernetes YAML 示例与解释
- Deployment(含标签/环境变量/端口/探针/优雅终止)
apiVersion: apps/v1
kind: Deployment
metadata:name: orleans-dictionary-applabels:app: orleans-dictionary-apporleans/serviceId: dictionary-app
spec:replicas: 3selector:matchLabels:app: orleans-dictionary-apptemplate:metadata:labels:app: orleans-dictionary-apporleans/serviceId: dictionary-apporleans/clusterId: dictionary-appspec:serviceAccountName: defaultautomountServiceAccountToken: truecontainers:- name: siloimage: my-registry.azurecr.io/my-orleans-app:latestimagePullPolicy: Alwaysports:- name: silocontainerPort: 11111- name: gatewaycontainerPort: 30000env:- name: ORLEANS_SERVICE_IDvalueFrom:fieldRef:fieldPath: metadata.labels['orleans/serviceId']- name: ORLEANS_CLUSTER_IDvalueFrom:fieldRef:fieldPath: metadata.labels['orleans/clusterId']- name: POD_NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: POD_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: POD_IPvalueFrom:fieldRef:fieldPath: status.podIP- name: STORAGE_CONNECTION_STRINGvalueFrom:secretKeyRef:name: az-storage-acctkey: key- name: DOTNET_SHUTDOWNTIMEOUTSECONDSvalue: "120"# 探针建议:轻量本地检查(与 Orleans 成员探测互补)livenessProbe:tcpSocket:port: siloinitialDelaySeconds: 10periodSeconds: 10failureThreshold: 3readinessProbe:tcpSocket:port: siloinitialDelaySeconds: 5periodSeconds: 5failureThreshold: 6resources:requests:cpu: "200m"memory: "512Mi"limits:cpu: "2"memory: "2Gi"terminationGracePeriodSeconds: 180strategy:type: RollingUpdaterollingUpdate:maxUnavailable: 0maxSurge: 1minReadySeconds: 60
- RBAC(允许 list/watch/delete/patch Pods,供代理使用)
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:name: orleans-hosting
rules:
- apiGroups: [ "" ]resources: ["pods"]verbs: ["get", "watch", "list", "delete", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:name: orleans-hosting-binding
subjects:
- kind: ServiceAccountname: defaultapiGroup: ''
roleRef:kind: Rolename: orleans-hostingapiGroup: ''
- Service(Silo 端口集群内可达,Gateway 端口对客户端暴露)
apiVersion: v1
kind: Service
metadata:name: orleans-silo
spec:selector:app: orleans-dictionary-appports:- name: siloport: 11111targetPort: 11111clusterIP: None---
apiVersion: v1
kind: Service
metadata:name: orleans-gateway
spec:type: LoadBalancerselector:app: orleans-dictionary-appports:- name: gatewayport: 30000targetPort: 30000
解释要点:
- 标签
orleans/serviceId
与orleans/clusterId
必须与应用一致(配置通过 env 注入到ClusterOptions
)。 - 环境变量
POD_NAME/POD_NAMESPACE/POD_IP
用于设置SiloName
与AdvertisedIPAddress
等。 - 探针以本地 TCP 检查为宜(不做跨 Pod 功能校验),与 Orleans 成员失效探测互补。
- 需要 RBAC 权限,避免代理在启动期或运行期访问 K8s API 遭遇 403。
五、时序图
- 启动期:对齐标签与成员、标记失配 Silo 为 Dead
- 运行期:选择 watchers 监听 K8s;Pod 删除触发 Silo Dead;可选删除失效 Pod
六、常见问题与排查
-
报错:
KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
- 进入 Pod 检查是否存在:
kubectl exec -it <pod> -- printenv | findstr KUBERNETES_SERVICE_
- 确保
automountServiceAccountToken: true
且绑定了有权限的 ServiceAccount(见上文 RBAC) - 参考:learn.microsoft.com - Orleans Kubernetes hosting
- 进入 Pod 检查是否存在:
-
Silo 名称与 Pod 名称要一致(由
POD_NAME
注入)。端口默认为11111/30000
,如自定义请在应用中配置EndpointOptions
。 -
未配置 Clustering Provider 时 Silo 无法加入集群:请在
UseKubernetesHosting()
同时配置任意一个 Provider(Azure/ADO.NET/Consul/…)。
七、最小化落地步骤
- 在应用中启用
UseKubernetesHosting()
并配置任一 Clustering Provider。 - 打包镜像并推送至镜像仓库。
- 创建集群 Secret(如
az-storage-acct
)存放 Clustering 连接串。 - 应用本文示例 Deployment、RBAC、Service 清单。
- 验证:
- Pod 上标签/环境变量齐全;
- 日志显示
AdvertisedIPAddress
为POD_IP
; - 多副本时可互相发现,删除某 Pod 会将对应 Silo 标记为 Dead;
- 探针通过,滚动升级不中断。
引用:
- learn.microsoft.com - Orleans Kubernetes hosting