StarRocks 集群安装部署文档
StarRocks 集群安装部署文档CENTOS
📋 部署前准备
1. 系统要求
硬件要求
FE节点:4核8GB内存以上
BE节点:8核16GB内存以上(建议16核32GB)
磁盘:SSD推荐,至少100GB可用空间
网络:万兆网络,节点间延迟<1ms
软件要求
操作系统:CentOS 7.x / Ubuntu 16.04+
Java:JDK 8/11
CPU:支持AVX2指令集
2. 环境检查脚本
创建环境检查脚本 check_environment.sh:
bash
#!/bin/bash echo "=== StarRocks 环境检查 ==="# 检查操作系统 echo -e "\n📋 操作系统信息:" cat /etc/os-release# 检查CPU是否支持AVX2 echo -e "\n🔍 CPU AVX2 支持检查:" if grep -q avx2 /proc/cpuinfo; thenecho "✅ CPU 支持 AVX2" elseecho "❌ CPU 不支持 AVX2,StarRocks 需要 AVX2 支持" fi# 检查内存 echo -e "\n💾 内存信息:" free -h# 检查磁盘 echo -e "\n💿 磁盘信息:" df -h# 检查Java echo -e "\n☕ Java 版本:" java -version 2>/dev/null || echo "Java 未安装"# 检查防火墙 echo -e "\n🔥 防火墙状态:" systemctl status firewalld 2>/dev/null | grep Active || echo "Firewalld 未运行"# 检查时间同步 echo -e "\n⏰ 时间同步状态:" timedatectl status 2>/dev/null | grep "System clock" || ntpstat# 检查网络 echo -e "\n🌐 网络连通性:" ping -c 2 www.starrocks.io > /dev/null && echo "✅ 外网连通正常" || echo "❌ 外网连接失败"echo -e "\n✅ 环境检查完成"
🚀 集群规划示例
假设3节点集群:
节点1:FE Leader + BE
节点2:FE Follower + BE
节点3:FE Observer + BE
| 节点 | IP地址 | 角色 | 配置 |
|---|---|---|---|
| sr-node1 | 192.168.1.101 | FE(Leader), BE | 8C16G |
| sr-node2 | 192.168.1.102 | FE(Follower), BE | 8C16G |
| sr-node3 | 192.168.1.103 | FE(Observer), BE | 8C16G |
📥 安装部署步骤
1. 下载安装包
bash
# 在所有节点执行 cd /opt wget https://releases.starrocks.io/starrocks/StarRocks-3.1.0.tar.gz tar -xzf StarRocks-3.1.0.tar.gz mv StarRocks-3.1.0 starrocks cd starrocks
2. 配置环境变量
bash
# 编辑 /etc/profile.d/starrocks.sh sudo tee /etc/profile.d/starrocks.sh << 'EOF' export STARROCKS_HOME=/opt/starrocks export PATH=$STARROCKS_HOME/fe/bin:$STARROCKS_HOME/be/bin:$PATH export JAVA_HOME=/usr/lib/jvm/java-11-openjdk # 根据实际路径调整 EOFsource /etc/profile.d/starrocks.sh
3. FE 节点配置
节点1 (FE Leader) 配置:
bash
# 创建FE配置目录 mkdir -p /opt/starrocks/fe/conf# 创建fe.conf sudo tee /opt/starrocks/fe/conf/fe.conf << 'EOF' # 元数据目录 meta_dir = /opt/starrocks/fe/meta# 网络配置 priority_networks = 192.168.1.0/24 http_port = 8030 rpc_port = 9020 query_port = 9030 edit_log_port = 9010# JVM配置 JAVA_OPTS = "-Xmx8192m -Xms8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0"# 集群配置 cluster_id = 1# 其他配置 sys_log_level = INFO EOF
节点2 (FE Follower) 配置:
bash
# 创建fe.conf(与节点1类似,但不需要指定cluster_id) sudo tee /opt/starrocks/fe/conf/fe.conf << 'EOF' meta_dir = /opt/starrocks/fe/meta priority_networks = 192.168.1.0/24 http_port = 8030 rpc_port = 9020 query_port = 9030 edit_log_port = 9010JAVA_OPTS = "-Xmx8192m -Xms8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0"sys_log_level = INFO EOF
节点3 (FE Observer) 配置:
bash
# 创建fe.conf(与节点2相同) sudo tee /opt/starrocks/fe/conf/fe.conf << 'EOF' meta_dir = /opt/starrocks/fe/meta priority_networks = 192.168.1.0/24 http_port = 8030 rpc_port = 9020 query_port = 9030 edit_log_port = 9010JAVA_OPTS = "-Xmx4096m -Xms4096m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0"sys_log_level = INFO EOF
4. BE 节点配置
在所有节点配置BE:
bash
# 创建BE配置目录 mkdir -p /opt/starrocks/be/conf mkdir -p /opt/starrocks/be/storage# 创建be.conf sudo tee /opt/starrocks/be/conf/be.conf << 'EOF' # 存储目录 storage_root_path = /opt/starrocks/be/storage# 网络配置 priority_networks = 192.168.1.0/24 be_port = 9060 be_http_port = 8040 heartbeat_service_port = 9050 brpc_port = 8060# 资源限制 mem_limit = 80% storage_page_cache_limit = 40%# 其他配置 sys_log_level = INFO EOF
5. 启动集群
启动FE(按顺序启动):
bash
# 在节点1启动FE Leader cd /opt/starrocks/fe ./bin/start_fe.sh --daemon# 检查节点1启动状态 curl http://192.168.1.101:8030/api/health# 在节点2启动FE Follower(连接到Leader) mysql -h 192.168.1.101 -P 9030 -uroot -e "ALTER SYSTEM ADD FOLLOWER '192.168.1.102:9010'" cd /opt/starrocks/fe ./bin/start_fe.sh --daemon# 在节点3启动FE Observer mysql -h 192.168.1.101 -P 9030 -uroot -e "ALTER SYSTEM ADD OBSERVER '192.168.1.103:9010'" cd /opt/starrocks/fe ./bin/start_fe.sh --daemon
启动所有BE节点:
bash
# 在每个节点启动BE cd /opt/starrocks/be ./bin/start_be.sh --daemon# 在FE Leader上添加BE节点 mysql -h 192.168.1.101 -P 9030 -uroot -e " ALTER SYSTEM ADD BACKEND '192.168.1.101:9050'; ALTER SYSTEM ADD BACKEND '192.168.1.102:9050'; ALTER SYSTEM ADD BACKEND '192.168.1.103:9050'; "
6. 验证集群状态
创建验证脚本 check_cluster_status.sh:
bash
#!/bin/bash echo "=== StarRocks 集群状态检查 ==="# 检查FE状态 echo -e "\n📊 FE 节点状态:" mysql -h 192.168.1.101 -P 9030 -uroot -e "SHOW PROC '/frontends'" 2>/dev/null# 检查BE状态 echo -e "\n💾 BE 节点状态:" mysql -h 192.168.1.101 -P 9030 -uroot -e "SHOW PROC '/backends'" 2>/dev/null# 检查集群版本 echo -e "\n🔍 集群版本信息:" mysql -h 192.168.1.101 -P 9030 -uroot -e "SELECT CURRENT_VERSION()" 2>/dev/null# 测试简单查询 echo -e "\n✅ 简单查询测试:" mysql -h 192.168.1.101 -P 9030 -uroot -e "SELECT 1 as test_result" 2>/dev/nullecho -e "\n🎉 集群状态检查完成"
运行验证:
bash
chmod +x check_cluster_status.sh ./check_cluster_status.sh
🔧 系统服务配置
创建systemd服务
FE服务:
bash
sudo tee /etc/systemd/system/starrocks-fe.service << 'EOF' [Unit] Description=StarRocks FE After=network.target[Service] Type=forking User=root Group=root ExecStart=/opt/starrocks/fe/bin/start_fe.sh --daemon ExecStop=/opt/starrocks/fe/bin/stop_fe.sh Restart=on-failure RestartSec=10 LimitNOFILE=65536[Install] WantedBy=multi-user.target EOF
BE服务:
bash
sudo tee /etc/systemd/system/starrocks-be.service << 'EOF' [Unit] Description=StarRocks BE After=network.target[Service] Type=forking User=root Group=root ExecStart=/opt/starrocks/be/bin/start_be.sh --daemon ExecStop=/opt/starrocks/be/bin/stop_be.sh Restart=on-failure RestartSec=10 LimitNOFILE=65536[Install] WantedBy=multi-user.target EOF
启用服务:
bash
sudo systemctl daemon-reload sudo systemctl enable starrocks-fe starrocks-be sudo systemctl start starrocks-fe starrocks-be
📝 常用管理命令
集群管理
sql
-- 查看FE状态 SHOW PROC '/frontends';-- 查看BE状态 SHOW PROC '/backends';-- 查看数据库 SHOW DATABASES;-- 创建用户 CREATE USER 'test' IDENTIFIED BY 'password'; GRANT ALL ON *.* TO 'test';
服务管理
bash
# 启动服务 systemctl start starrocks-fe systemctl start starrocks-be# 停止服务 systemctl stop starrocks-fe systemctl stop starrocks-be# 查看服务状态 systemctl status starrocks-fe systemctl status starrocks-be
🐛 故障排查
常见问题解决
FE启动失败
bash
# 检查元数据 tail -f /opt/starrocks/fe/log/fe.log# 检查端口占用 netstat -tulpn | grep -E ':(8030|9030|9010)'
BE无法注册
bash
# 检查BE日志 tail -f /opt/starrocks/be/log/be.INFO# 检查网络连通性 telnet 192.168.1.101 9030
磁盘空间不足
sql
-- 清理过期数据 ADMIN SET FRONTEND CONFIG ("max_tablet_version_num" = "500");
📊 监控配置
启用监控指标
bash
# 在BE配置中启用监控 echo "enable_metric_calculator = true" >> /opt/starrocks/be/conf/be.conf# 重启BE服务 systemctl restart starrocks-be
访问监控界面:http://192.168.1.101:8030
✅ 部署完成检查清单
所有节点时间同步
防火墙端口开放
FE节点正常启动
BE节点正常注册
集群状态健康
简单查询测试通过
系统服务配置完成
恭喜!StarRocks 集群部署完成! 🎉
