当前位置：首页 > news >正文

clickhouse-server连不上clickhouse-keeper的问题记录

news 来源：原创 2025/6/22 9:29:45

背景

想简单部署一个1 shard 2 replica，1keeper的集群。

有两个虚拟机：192.168.1.3，192.168.1.6。

192.168.1.3：部署1个ck，1个keeper

192.168.1.6：部署1个ck

192.168.1.3和192.168.1.6的ck组成1个shard，互为副本。

我的配置

clickhouse-server：

config.d/house-keeper.xml

<clickhouse>
<zookeeper>
<node index="1">
<host>server1</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>

clickhouse-keeper：

<clickhouse>
<path>/data/keeper/conf</path>

<keeper_server>
<tcp_port>9181</tcp_port>
<listen_host>0.0.0.0</listen_host> 
<listen_host>::</listen_host> 
<server_id>1</server_id>
<log_storage_path>/data/keeper/log</log_storage_path>
<snapshot_storage_path>/data/keeper/snapshots</snapshot_storage_path>
<create_snapshot_on_exit>0</create_snapshot_on_exit>
<digest_enabled>1</digest_enabled>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>100000</session_timeout_ms>
<min_session_timeout_ms>10000</min_session_timeout_ms>
<force_sync>false</force_sync>
<startup_timeout>240000</startup_timeout>

<reserved_log_items>100000</reserved_log_items>
<snapshot_distance>100000</snapshot_distance>


<heart_beat_interval_ms>0</heart_beat_interval_ms>
<election_timeout_lower_bound_ms>0</election_timeout_lower_bound_ms>
<election_timeout_upper_bound_ms>0</election_timeout_upper_bound_ms>

<compress_logs>0</compress_logs>

<async_replication>1</async_replication>

<latest_logs_cache_size_threshold>1073741824</latest_logs_cache_size_threshold>
<commit_logs_cache_size_threshold>524288000</commit_logs_cache_size_threshold>

<raft_logs_level>trace</raft_logs_level>
</coordination_settings>

<raft_configuration>
<server>
<id>1</id>
<hostname>server1</hostname>
<port>9234</port>
</server>
</raft_configuration>

<feature_flags>
<filtered_list>1</filtered_list>
<multi_read>1</multi_read>
<check_not_exists>1</check_not_exists>
<create_if_not_exists>1</create_if_not_exists>
</feature_flags>
</keeper_server>
</clickhouse>

server1:192.168.1.3

server2:192.168.1.6

问题

之后在启动所有涉及的server之后，有这些错误：

1.ck端

<Error> virtual bool DB::DDLWorker::initializeMainThread(): Code: 999. Coordination::Exception: All connection tries failed while connecting to ZooKeeper. nodes:

2.keeper端日志显示

使用命令查看对外开外的连接端口：

netstat -tulnp | grep 9181
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:9181 0.0.0.0:* LISTEN 19010/./clickhouse-
tcp6 0 0 ::1:9181 :::* LISTEN 19010/./clickhouse-

发现不对劲，我明明配置了listen_host为0.0.0.0啊：

解决

百思不得其解之后，打算查查源码此处listen_host是如何设置的。

翻阅源码：在listen_hosts为空的情况下，监听以下两个ip。

std::vector<std::string> listen_hosts = DB::getMultipleValuesFromConfig(config(), "", "listen_host");

bool listen_try = config().getBool("listen_try", false);

if (listen_hosts.empty())

{

listen_hosts.emplace_back("::1");

listen_hosts.emplace_back("127.0.0.1");

listen_try = true;

}

那这就奇怪了，莫非是我listen_host的位置不对？？？

之后再翻阅代码，发现取的是clickhouse标签下的listen_host标签。

于是，我调整listen_host的位置在<clickhouse>标签之后，重启三个server之后，发现此时连接成功了。（ps: 之前是设置在<keeper_server>标签之后）。