当前位置: 首页 > news >正文

3.hadoop3.3.6 HA集群搭建

1.服务器规划

软件版本参考:
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2?hl=zh-cn

hadoop HA, 主要分为HDFS的HA和YARN的HA。

集群规划
在这里插入图片描述

2.预先准备

2.1.ssh免密登录

ssh-keygen -t rsa

2.2 jdk 安装

2.3 配置host

sudo vim /etc/hosts

10.128.0.18 hadoop01
10.128.0.21 hadoop02
10.128.0.22 hadoop03

前面为机器对应的内网ip。

3.hadoop配置

环境变量参考
/etc/profile.d/my_env.sh

export JAVA_HOME=/opt/apps/jdk
export HADOOP_HOME=/opt/apps/hadoop
export ZK_HOME=/opt/apps/zookeeper


export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$PATH

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- 指定 NameNode 的地址 -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://bits</value>
  </property>
  <!-- 指定 hadoop 数据的存储目录 -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/data/hadoop/data</value>
  </property>
  <!-- 配置 HDFS 网页登录使用的静态用户为 bigdatabit9 -->
  <property>
    <name>hadoop.http.staticuser.user</name>
    <value>bigdatabit9</value>
  </property>
  <!-- 指定 zkfc 要连接的 zkServer 地址 -->
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
  </property>
  <!-- NN 连接 JN 重试次数,默认是 10 次 -->
  <property>
    <name>ipc.client.connect.max.retries</name>
    <value>20</value>
  </property>
  <!-- 重试时间间隔,默认 1s -->
  <property>
    <name>ipc.client.connect.retry.interval</name>
    <value>5000</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- NameNode 数据存储目录 -->
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file://${hadoop.tmp.dir}/name</value>
  </property>
  <!-- DataNode 数据存储目录 -->
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file://${hadoop.tmp.dir}/data</value>
  </property>
  <!-- JournalNode 数据存储目录 -->
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>${hadoop.tmp.dir}/jn</value>
  </property>
  <!-- 完全分布式集群名称 -->
  <property>
    <name>dfs.nameservices</name>
    <value>bits</value>
  </property>
  <!-- 集群中 NameNode 节点都有哪些 -->
  <property>
    <name>dfs.ha.namenodes.bits</name>
    <value>nn1,nn2,nn3</value>
  </property>
  <!-- NameNode 的 RPC 通信地址 -->
  <property>
    <name>dfs.namenode.rpc-address.bits.nn1</name>
    <value>hadoop01:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.bits.nn2</name>
    <value>hadoop02:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.bits.nn3</name>
    <value>hadoop03:8020</value>
  </property>
  <!-- NameNode 的 http 通信地址 -->
  <property>
    <name>dfs.namenode.http-address.bits.nn1</name>
    <value>hadoop01:9870</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.bits.nn2</name>
    <value>hadoop02:9870</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.bits.nn3</name>
    <value>hadoop03:9870</value>
  </property>
  <!-- 启用 nn 故障自动转移 -->
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 -->
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/bits</value>
  </property>
  <!-- 访问代理类:client 用于确定哪个 NameNode 为 Active -->
  <property>
    <name>dfs.client.failover.proxy.provider.bits</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
  </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
  <!-- 指定 MR 走 shuffle -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <!-- 启用 resourcemanager ha -->
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <!-- 启用自动恢复 -->
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>

  <!-- 声明两台 resourcemanager 的地址 -->
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster-yarn1</value>
  </property>
  <!--指定 resourcemanager 的逻辑列表-->
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2,rm3</value>
  </property>

  <!-- ========== rm1 的配置 ========== -->
  <!-- 指定 rm1 的主机名 -->
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop01</value>
  </property>
  <!-- 指定 rm1 的 web 端地址 -->
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>hadoop01:8088</value>
  </property>
  <!-- 指定 rm1 的内部通信地址 -->
  <property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>hadoop01:8032</value>
  </property>
  <!-- 指定 AM 向 rm1 申请资源的地址 -->
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>hadoop01:8030</value>
  </property>
  <!-- 指定供 NM 连接的地址 -->
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>hadoop01:8031</value>
  </property>

  <!-- ========== rm2 的配置 ========== -->
  <!-- 指定 rm2 的主机名 -->
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop02</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>hadoop02:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>hadoop02:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>hadoop02:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>hadoop02:8031</value>
  </property>

  <!-- ========== rm3 的配置 ========== -->
  <!-- 指定 rm1 的主机名 -->
  <property>
    <name>yarn.resourcemanager.hostname.rm3</name>
    <value>hadoop03</value>
  </property>
  <!-- 指定 rm1 的 web 端地址 -->
  <property>
    <name>yarn.resourcemanager.webapp.address.rm3</name>
    <value>hadoop03:8088</value>
  </property>
  <!-- 指定 rm1 的内部通信地址 -->
  <property>
    <name>yarn.resourcemanager.address.rm3</name>
    <value>hadoop03:8032</value>
  </property>
  <!-- 指定 AM 向 rm1 申请资源的地址 -->
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm3</name>
    <value>hadoop03:8030</value>
  </property>
  <!-- 指定供 NM 连接的地址 -->
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm3</name>
    <value>hadoop03:8031</value>
  </property>
  <!-- 指定 zookeeper 集群的地址 -->
  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
  </property>

  <!-- 指定 resourcemanager 的状态信息存储在 zookeeper 集群 -->
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <!-- 环境变量的继承 -->
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- 指定 MapReduce 程序运行在 Yarn 上 -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

workers

hadoop01
hadoop02
hadoop03

注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

4.启动集群

配置好环境变量

格式化集群
第一次需要。如果需要重新格式化,需要删除集群中的data和logs目录。

启动JournalNode
在各个 JournalNode 节点上,输入以下命令启动 journalnode 服务

hdfs --daemon start journalnode

在[nn1]上,对其进行格式化,并启动

hdfs namenode -format
hdfs --daemon start namenode

在[nn2]和[nn3]上,同步 nn1 的元数据信息

hdfs namenode -bootstrapStandby

启动[nn2]和[nn3]

hdfs --daemon start namenode

启动hdfs

sbin/start-dfs.sh

启动yarn
ResourceManager要在配置为ResourceManager的节点上启动。

sbin/start-yarn.sh

5.问题解决

1.namenode挂了,无法自动完成主从切换,需要重新起来才行。

	<!-- 配置隔离机制 -->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    
    <!-- 配置隔离机制时需要的ssh密钥登录 -->
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/dev/.ssh/id_rsa</value>
    </property>

解决:
隔离机制配置为sshfence,如果一台机器挂了,新的主会执行一次杀死原主namenode的操作,如果挂了,这一步执行不通。会导致集群卡在这里,需要人工把namenode启动起来,然后集群会自动选主。这里主要是为了防止脑裂。
将上面的配置改为

<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
  </property>

或者其他的shell命令。但是这时候,需要自己考虑脑裂的问题。

相关文章:

  • SpringBoot源码解析(十):应用上下文AnnotationConfigServletWebServerApplicationContext构造方法
  • 深度学习-1.简介
  • 深入探索 DeepSeek 在数据分析与可视化中的应用
  • qemu启动aarch64 linux+ buildroot + 应用程序
  • MAVSDK - Custom Mavlink处理
  • mybatis-mp正式改名为xbatis!!!
  • deepseek+ollama+anythingllm在本地搭建个人知识库
  • ollama server启动服务后如何停止
  • HTML之JavaScript DOM(document)编程处理事件
  • Java编程语言:从基础到高级应用的全面探索
  • 火语言RPA--Excel打开文档
  • OpenHarmony 系统性能优化——默认关闭全局动画
  • DIEN:深度兴趣演化网络
  • AI领域技术架构
  • flutter 状态栏不显示 问题解决办法
  • Redux中间件redux-thunk和redux-saga的具体区别是什么?
  • spring cloud和dubbo的特点和优劣势
  • 【Linux】【网络】Reactor模式
  • tailwindcss学习01
  • JavaScript中数组的常用方法
  • 光速晋级!2025年多哈世乒赛孙颖莎4比0战胜对手
  • 朱雀二号改进型遥二运载火箭发射成功
  • 尹锡悦宣布退出国民力量党
  • 白玉兰奖征片综述丨海外剧创作趋势观察:跨界·融变·共生
  • 阿里上财年营收增6%,蒋凡:会积极投资,把更多淘宝用户转变成即时零售用户
  • 商务部:中方敦促美方尽快停止232关税措施