Terraform Helm:微服务基础设施即代码
🚀 Terraform & Helm:微服务基础设施即代码
📚 目录
- 🚀 Terraform & Helm:微服务基础设施即代码
- 1. 引言 🚀
- 2. 环境与依赖 🧰
- 3. 架构示意 🏗️
- 4. Terraform 定义云资源 🛠️
- 4.1 Provider 与 Backend
- 4.2 公共变量与 Tag
- 4.3 Resource Group 模块
- 4.4 VNet 模块
- 4.5 Key Vault 模块
- 4.6 AKS 模块
- 4.7 RDS 模块
- 4.8 根模块调用
- 5. Helm Chart 打包与校验 📦
- 5.1 Chart.yaml & Chart.lock
- 5.2 values.schema.json
- 5.3 Subchart 示例(gateway)
- 5.4 Lint & Test
- 6. CI/CD 流水线 🔄
- 6.1 Infra Workflow
- 6.2 Deploy Workflow
- 7. 可观测性与告警 🔔
- 8. 附录 📂
1. 引言 🚀
✨ TL;DR
- 🛠️ 精准声明:Terraform 管理 Resource Group、VNet(多子网)、AKS(跨可用区 + Azure Monitor)、Key Vault、PostgreSQL 等生产资源,统一 Tag 并锁定 Provider 版本
- 📦 Umbrella Chart:Helm Umbrella Chart 支持多微服务(Gateway、Identity…),含 Probes、PodDisruptionBudget、NetworkPolicy、HPA、Secret 引用、values.schema.json 与 Chart.lock
- 🔄 端到端 CI/CD:GitHub Actions 流水线集成 OIDC 登录、Terraform fmt/validate/lint、Checkov、Infracost、Azure 登录、ACR 鉴权、Helm lint/test/package/push、自动回滚,并发控制与环境审批
- 🌟 企业级要素:高性能(Azure CNI + HPA + PDB)、高可用(多 AZ + Monitoring)、安全可复现(Key Vault + Terraform Backend + Sensitive + OIDC + Cost Scan)
📚 背景与动机
在多微服务架构下,“在我机器上没问题”往往难以复现🧩。通过 Terraform + Helm Chart + GitOps/CI-CD,可实现基础设施与应用部署一体化自动化、一致化、可审计、可回滚,大幅提升交付速度与可靠性💪。
2. 环境与依赖 🧰
terraform version # >=1.4
az version # 最新 Azure CLI
kubectl version --client
helm version # >=3.8+
仓库结构
.
├─ infra/
│ ├─ terraform/
│ │ ├─ backend.tf
│ │ ├─ required_providers.tf
│ │ ├─ variables.tf
│ │ ├─ main.tf
│ │ ├─ outputs.tf
│ │ └─ modules/
│ │ ├─ resource_group/
│ │ ├─ vnet/
│ │ ├─ keyvault/
│ │ ├─ aks/
│ │ └─ rds/
│ └─ helm-charts/
│ └─ abp-vnext/
│ ├─ Chart.yaml
│ ├─ Chart.lock
│ ├─ values.schema.json
│ ├─ values.yaml
│ ├─ values-dev.yaml
│ ├─ values-prod.yaml
│ ├─ charts/ # Gateway、Identity 等 Subcharts
│ └─ templates/ # Umbrella 公共资源(Ingress 可选)
└─ src/└─ MyAbpSolution.sln
3. 架构示意 🏗️
4. Terraform 定义云资源 🛠️
4.1 Provider 与 Backend
# required_providers.tf
terraform {required_version = ">= 1.4"required_providers {azurerm = {source = "hashicorp/azurerm"version = "~> 3.0"}}backend "azurerm" {resource_group_name = "tfstate-rg"storage_account_name = "tfstateacct"container_name = "tfstate"key = "${terraform.workspace}.tfstate"}
}
4.2 公共变量与 Tag
# variables.tf
variable "location" { type = string }
variable "environment" { type = string }
variable "owner" { type = string }
variable "cost_center" { type = string }
variable "db_admin" { type = string }
variable "db_password" { type = string }locals {common_tags = {environment = var.environmentowner = var.ownercost_center = var.cost_center}
}
4.3 Resource Group 模块
# modules/resource_group/main.tf
resource "azurerm_resource_group" "this" {name = var.namelocation = var.locationtags = var.tags
}
output "rg_name" { value = azurerm_resource_group.this.name }
# modules/resource_group/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "tags" { type = map(string) }
4.4 VNet 模块
# modules/vnet/main.tf
resource "azurerm_virtual_network" "this" {name = var.nameaddress_space = var.address_spacelocation = var.locationresource_group_name = var.rg_nametags = var.tags
}resource "azurerm_subnet" "this" {for_each = var.subnetsname = each.keyresource_group_name = var.rg_namevirtual_network_name= azurerm_virtual_network.this.nameaddress_prefixes = [each.value]
}output "subnet_ids_map" {value = { for s in azurerm_subnet.this : s.name => s.id }
}
# modules/vnet/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "address_space" { type = list(string) }
variable "subnets" { type = map(string) }
variable "tags" { type = map(string) }
4.5 Key Vault 模块
# modules/keyvault/main.tf
data "azurerm_client_config" "current" {}resource "azurerm_key_vault" "this" {name = var.namelocation = var.locationresource_group_name = var.rg_nametenant_id = data.azurerm_client_config.current.tenant_idsku_name = "standard"purge_protection_enabled = truesoft_delete_enabled = truetags = var.tags
}resource "azurerm_key_vault_secret" "db_password" {name = "db-password"value = var.admin_passwordkey_vault_id = azurerm_key_vault.this.id
}
# modules/keyvault/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "admin_password" { type = string }
variable "tags" { type = map(string) }
4.6 AKS 模块
# modules/aks/main.tf
resource "azurerm_kubernetes_cluster" "this" {name = var.namelocation = var.locationresource_group_name = var.rg_namedns_prefix = var.dns_prefixtags = var.tagsdefault_node_pool {name = "agentpool"vm_size = var.node_sizeavailability_zones = var.availability_zonesenable_auto_scaling = var.enable_auto_scalermin_count = var.min_countmax_count = var.max_countos_disk_size_gb = 50}network_profile {network_plugin = "azure"network_policy = "calico"load_balancer_sku = "standard"subnet_id = var.aks_subnet_id}identity { type = "SystemAssigned" }addon_profile { oms_agent { enabled = true } } # Azure Monitor
}output "kube_config" {value = azurerm_kubernetes_cluster.this.kube_admin_config_rawsensitive = true
}
# modules/aks/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "dns_prefix" { type = string }
variable "aks_subnet_id" { type = string }
variable "node_size" { type = string }
variable "availability_zones"{ type = list(string), default = ["1","2","3"] }
variable "enable_auto_scaler" { type = bool, default = true }
variable "min_count" { type = number, default = 2 }
variable "max_count" { type = number, default = 5 }
variable "tags" { type = map(string) }
4.7 RDS 模块
# modules/rds/main.tf
data "azurerm_key_vault_secret" "db_pwd" {name = "db-password"key_vault_id = var.keyvault_id
}resource "azurerm_postgresql_flexible_server" "this" {name = var.namelocation = var.locationresource_group_name = var.rg_nameversion = var.pg_versionsku_name = var.sku_namestorage_mb = var.storage_mbdelegated_subnet_id = var.subnet_idadministrator_login = var.admin_useradministrator_login_password = data.azurerm_key_vault_secret.db_pwd.valuetags = var.tags
}output "connection_string" {value = format("Host=%s;Port=5432;Username=%s;Password=%s;Database=%s",azurerm_postgresql_flexible_server.this.fqdn,var.admin_user,data.azurerm_key_vault_secret.db_pwd.value,var.db_name)sensitive = true
}
# modules/rds/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "pg_version" { type = string }
variable "sku_name" { type = string }
variable "storage_mb" { type = number }
variable "admin_user" { type = string }
variable "subnet_id" { type = string }
variable "db_name" { type = string }
variable "keyvault_id" { type = string }
variable "tags" { type = map(string) }
4.8 根模块调用
# main.tf
provider "azurerm" { features {} }module "rg" {source = "./modules/resource_group"name = "${var.environment}-rg"location = var.locationtags = local.common_tags
}module "vnet" {source = "./modules/vnet"name = "${var.environment}-vnet"location = var.locationrg_name = module.rg.rg_nameaddress_space = ["10.0.0.0/16"]subnets = { aks = "10.0.1.0/24", db = "10.0.2.0/24" }tags = local.common_tags
}module "keyvault" {source = "./modules/keyvault"name = "${var.environment}-kv"location = var.locationrg_name = module.rg.rg_nameadmin_password = var.db_passwordtags = local.common_tags
}module "aks" {source = "./modules/aks"name = "${var.environment}-aks"location = var.locationrg_name = module.rg.rg_namedns_prefix = var.environmentaks_subnet_id = module.vnet.subnet_ids_map["aks"]node_size = "Standard_DS2_v2"availability_zones= ["1","2","3"]enable_auto_scaler= truemin_count = 2max_count = 5tags = local.common_tags
}module "rds" {source = "./modules/rds"name = "${var.environment}-pg"location = var.locationrg_name = module.rg.rg_namepg_version = "13"sku_name = "GP_Gen5_2"storage_mb = 5120admin_user = var.db_adminsubnet_id = module.vnet.subnet_ids_map["db"]keyvault_id = module.keyvault.azurerm_key_vault.this.iddb_name = "abpdb"tags = local.common_tags
}
# outputs.tf
output "kubeconfig" { value = module.aks.kube_config sensitive = true }
output "db_conn_string" { value = module.rds.connection_string sensitive = true }
5. Helm Chart 打包与校验 📦
5.1 Chart.yaml & Chart.lock
# Chart.yaml
apiVersion: v2
name: abp-vnext
version: 0.4.0
appVersion: "1.0.0"
description: "ABP VNext 多服务 Kubernetes Umbrella Chart"
dependencies:- name: gatewayversion: "0.2.0"repository: file://charts/gateway- name: identityversion: "0.2.0"repository: file://charts/identity
helm dependency update infra/terraform/helm-charts/abp-vnext
helm dependency build infra/terraform/helm-charts/abp-vnext
5.2 values.schema.json
{"$schema": "https://json-schema.org/draft/2020-12/schema","type": "object","properties": {"replicaCount": { "type": "integer" },"image": {"type": "object","properties": {"repository": { "type": "string" },"tag": { "type": "string" }},"required": ["repository","tag"]},"service": {"type": "object","properties": {"type": { "type": "string" },"port": { "type": "integer" }},"required": ["type","port"]}},"required": ["replicaCount","image","service"]
}
5.3 Subchart 示例(gateway)
# charts/gateway/values.yaml
replicaCount: 2
image:repository: myacr.azurecr.io/abp-gatewaytag: "1.0.0"
service:type: ClusterIPport: 80
resources:limits:cpu: "500m"memory: "512Mi"requests:cpu: "250m"memory: "256Mi"
# charts/gateway/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:name: {{ include "gateway.fullname" . }}
spec:replicas: {{ .Values.replicaCount }}selector:matchLabels:app: {{ include "gateway.name" . }}template:metadata:labels:app: {{ include "gateway.name" . }}spec:containers:- name: gatewayimage: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"ports:- containerPort: 80readinessProbe:httpGet:path: /health/readyport: 80initialDelaySeconds: 20periodSeconds: 10livenessProbe:httpGet:path: /health/liveport: 80initialDelaySeconds: 30periodSeconds: 15resources:{{ toYaml .Values.resources | indent 12 }}
其余 Service、Ingress、PDB、NetworkPolicy、HPA 与前文一致。
5.4 Lint & Test
helm lint infra/terraform/helm-charts/abp-vnext --strict
helm template abp-vnext infra/terraform/helm-charts/abp-vnext
helm package infra/terraform/helm-charts/abp-vnext -d charts-packages
6. CI/CD 流水线 🔄
6.1 Infra Workflow
# .github/workflows/infra.yml
name: Infra – Terraformon:push:paths: ["infra/terraform/**"]concurrency:group: infra-${{ github.ref }}cancel-in-progress: truepermissions:id-token: writecontents: readjobs:terraform:runs-on: ubuntu-latestoutputs:kubeconfig: ${{ steps.apply.outputs.kubeconfig }}db_conn_string: ${{ steps.apply.outputs.db_conn_string }}steps:- uses: actions/checkout@v3- name: Setup Terraformuses: hashicorp/setup-terraform@v2- name: Terraform Fmt Checkrun: terraform fmt -checkworking-directory: infra/terraform- name: Terraform Validaterun: terraform validateworking-directory: infra/terraform- name: Terraform Initrun: terraform init -input=falseworking-directory: infra/terraform- name: Checkov Scanuses: bridgecrewio/checkov-action@masterwith:directory: infra/terraform- name: Infracost Estimateuses: infracost/actions@v2with:path: infra/terraformenv:INFRACOST_TOKEN: ${{ secrets.INFRACOST_TOKEN }}- name: Terraform Planid: planrun: terraform plan -out=tfplanworking-directory: infra/terraform- name: Terraform Applyid: applyrun: |terraform workspace select ${{ github.ref_name }} || terraform workspace new ${{ github.ref_name }}terraform apply -auto-approve tfplanecho "kubeconfig<<EOF" >> $GITHUB_OUTPUTterraform output -raw kubeconfig >> $GITHUB_OUTPUTecho "EOF" >> $GITHUB_OUTPUTecho "db_conn_string<<EOF" >> $GITHUB_OUTPUTterraform output -raw db_conn_string >> $GITHUB_OUTPUTecho "EOF" >> $GITHUB_OUTPUTworking-directory: infra/terraformenv:ARM_USE_MSI: trueARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
6.2 Deploy Workflow
# .github/workflows/deploy.yml
name: Deploy – Helmon:push:paths:- "src/**"- "infra/terraform/helm-charts/**"concurrency:group: deploy-${{ github.ref }}cancel-in-progress: truepermissions:id-token: writecontents: readenvironment:name: productionurl: https://abp.example.comreviewers:- alice- bobjobs:deploy:runs-on: ubuntu-latestneeds: terraformsteps:- uses: actions/checkout@v3- name: Azure Login via OIDCuses: azure/login@v1with:client-id: ${{ secrets.AZURE_CLIENT_ID }}tenant-id: ${{ secrets.AZURE_TENANT_ID }}subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}- name: AKS Set Contextuses: azure/aks-set-context@v1with:resource-group: production-rgcluster-name: production-aks- name: Docker Login to ACRuses: docker/login-action@v2with:registry: myacr.azurecr.iousername: ${{ secrets.ACR_USERNAME }}password: ${{ secrets.ACR_PASSWORD }}- name: Build & Push Imagerun: |docker build -t myacr.azurecr.io/abp-vnext:${{ github.sha }} src/docker push myacr.azurecr.io/abp-vnext:${{ github.sha }}- name: Set DEPLOY_ENVrun: |if [ "${GITHUB_REF}" == "refs/heads/main" ]; thenecho "DEPLOY_ENV=prod" >> $GITHUB_ENVelseecho "DEPLOY_ENV=dev" >> $GITHUB_ENVfi- name: Helm Lint & Testrun: |helm lint infra/terraform/helm-charts/abp-vnext --stricthelm test --cleanup gateway- name: Helm Package & Pushrun: |helm package infra/terraform/helm-charts/abp-vnext -d charts-packageshelm push charts-packages/abp-vnext-*.tgz oci://myhelmrepo- name: Helm Upgrade with Rollbackrun: |set +ehelm repo add myrepo oci://myhelmrepohelm repo updatehelm upgrade --install abp-vnext myrepo/abp-vnext \--version 0.4.0 \-f infra/terraform/helm-charts/abp-vnext/values-${DEPLOY_ENV}.yaml \--set image.tag=${GITHUB_SHA} \--set-string env.DB_CONN="${{ needs.terraform.outputs.db_conn_string }}" \--wait --timeout 5mif [ $? -ne 0 ]; thenhelm rollback abp-vnext 1exit 1fiset -e- name: Verify Rolloutrun: kubectl rollout status deployment/gateway- name: Notify Slack on Successif: success()uses: 8398a7/action-slack@v3with:payload: '{"text":"✅ 部署成功:ABP VNext 微服务已更新到 '"${{ github.sha }}"'"}'channel: production-alertsenv:SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}- name: Notify Slack on Failureif: failure()uses: 8398a7/action-slack@v3with:payload: '{"text":"❌ 部署失败:请检查流水线日志!"}'channel: production-alertsenv:SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
7. 可观测性与告警 🔔
- Azure Monitor:已开启 Container Insights
- Prometheus/Grafana:可选部署,收集集群与业务指标
- EFK/ELK:通过 DaemonSet 或 Sidecar 收集日志
- Alertmanager:基于阈值触发告警,推送到 Slack/Teams
8. 附录 📂
参考资料:
- Terraform Azure Provider 文档
- Helm 官方文档