当前位置：首页 > news >正文

Hadoop集群搭建（下）：centos 7为例（已将将安装所需压缩包统一放在了/opt/software目录下）

news 2025/11/13 6:31:38

一. 集群简介

1.HADOOP 集群具体来说包含两个集群：

HDFS集群和YARN集群，两者逻辑上分离，但物理上常在一起。HDFS集群负责海量数据的存储，集群中的角色主要有：NameNode、DataNode、SecondaryNameNode。

YARN集群负责海量数据运算时的资源调度，集群中的角色主要有：ResourceManager、NodeManager。

2.那 MAPREDUCE 是什么呢？

它其实是一个分布式运算编程框架，是应用程序开发包，由用户按照编程规范进行程序开发，后打包运行在HDFS集群上，并且受到YARN集群的资源调度管理。

3.集群部署方式Hadoop 集群的部署方式分为 3 种，分别是独立模式（Standalone mode）、伪分布式模式（Pseudo-Distributed mode）、完全分布式模式（Cluster mode），具体介绍如下：

（1）独立模式：又称为单机模式，在该模式下，无须运行任何守护进程，所有的程序都在单个 JVM 上执行。独立模式下调试 Hadoop 集群的 MapReduce 程序非常方便，所以一般情况下，该模式在学习或者开发阶段调试使用。

（2）伪分布式模式：Hadoop 程序的守护进程运行在一台主机节点上，通常使用伪分布式模式来调试Hadoop 分布式程序的代码，以及程序执行是否正确，伪分布式模式是完全分布式模式的一个特例。

（3）完全分布式模式：Hadoop 的守护进程分别运行在由多个主机搭建的集群上，不同节点担任不同的角色，在实际工作应用开发中，通常使用该模式构建企业级 Hadoop 系统。在 Hadoop 环境中，所有服务器节点仅划分为两种角色，分别是 master（主节点，1个）和 slave（从节点，多个）。因此，伪分布模式是集群模式的特例，只是将主节点和从节点合二为一罢了。

接下来，以 1台虚拟机（Master）为例，阐述伪分布模式Hadoop 集群的安装与配置方法。

二.准备hadoop

1.将准备好的压缩文件解压：

cd /opt/software  #进入准备好的文件
tar zxvf hadoop-3.1.3.tar.gz -C /opt/moudle/  #将文件解压到指定目录

2.hadoop目录介绍：

bin : Hadoop 最基本的管理脚本和使用脚本的目录，这些脚本是 sbin 目录下管理脚本的基础实现，用户可以直接使用这些脚本管理和使用 Hadoop。

etc : Hadoop配置文件所在的目录include对外提供的编程库头文件（具体动态库和静态库在lib目录中），这些头文件均是用C++ 定义的，通常用于 C++ 程序访问 HDFS 或者编写 MapReduce 程序。

lib : 该目录包含了 Hadoop 对外提供的编程动态库和静态库，与include目录中的头文件结合使用。libexec各个服务对用的 shell 配置文件所在的目录，可用于配置日志输出、启动参数（比如JVM 参数）等基本信息。

sbin : Hadoop 管理脚本所在的目录，主要包含 HDFS 和 YARN 中各类服务的启动/关闭脚本。

share ：Hadoop 各个模块编译后的jar包所在的目录，官方自带示例。

三.编辑 Hadoop 配置文件

1.文件介绍：

Hadoop 默认提供了两种配置文件：一种是只读的默认配置文件，包括core-default.xml、hdfs-default.xml、mapred-default.xml和yarn-default.xml，这些文件包含了 Hadoop 系统各种默认配置参数；另一种是 Hadoop 集群自定义配置时编辑的配置文件（这些文件多数没有任何配置内容，存在于 Hadoop 解压包下的 etc/hadoop/目录中），包括core-site.xml、hdfs-site.xml、mapred-site.xml和yarn-site.xml等，可以根据需要在这些文件中对上一种默认配置文件中的参数进行修改，Hadoop 会优先选择这些配置文件中的参数。

2.配置文件功能描述：

hadoop-env.sh 配置 Hadoop 运行所需的环境变量：

#配置JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212/
#设置用户以执行对应角色shell命令
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

core-site.xmlHadoop 核心全局配置文件，可在其他配置文件中引用该文件：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><!-- 默认文件系统的名称。通过URI中schema区分不同文件系统。--><!-- file:///本地文件系统 hdfs:// hadoop分布式文件系统 gfs://。--><!-- hdfs文件系统访问地址：http://nn_host:8020。--><property><name>fs.defaultFS</name><value>hdfs://master:8020</value></property><!-- hadoop本地磁盘存放数据的公共目录 --><property><name>hadoop.tmp.dir</name><value>/opt/data/hadoop-3.1.3</value></property><!-- 在Web UI访问HDFS使用的用户名。--><property><name>hadoop.http.staticuser.user</name><value>root</value></property><!-- 配置该root(superUser)允许通过代理访问的主机节点 --><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><!-- 配置该root(superUser)允许通过代理用户所属组 --><property><name>hadoop.proxyuser.root.groups</name><value>*</value></property><!-- 配置该root(superUser)允许通过代理的用户--><property><name>hadoop.proxyuser.root.users</name><value>*</value></property>
</configuration>

hdfs-site.xmlHDFS 配置文件，继承 core-site.xml 配置文件：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><!-- 设定SNN运行主机和端口。--><property><name>dfs.namenode.secondary.http-address</name><value>master:9868</value></property>
</configuration>

mapred-site.xmlMapReduce 配置文件，继承 core-site.xml 配置文件：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><!-- mr程序默认运行方式。yarn集群模式 local本地模式--><property><name>mapreduce.framework.name</name><value>yarn</value></property><!-- MR App Master环境变量。--><property><name>yarn.app.mapreduce.am.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value></property><!-- MR MapTask环境变量。--><property><name>mapreduce.map.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value></property><!-- MR ReduceTask环境变量。--><property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value></property>
</configuration>

yarn-site.xmlYARN 配置文件，继承 core-site.xml 配置文件

<?xml version="1.0"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<configuration><!-- yarn集群主角色RM运行机器。--><property><name>yarn.resourcemanager.hostname</name><value>master</value></property><!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MR程序。--><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!-- 每个容器请求的最小内存资源（以MB为单位）。--><property><name>yarn.scheduler.minimum-allocation-mb</name><value>512</value></property><!-- 每个容器请求的最大内存资源（以MB为单位）。--><property><name>yarn.scheduler.maximum-allocation-mb</name><value>2048</value></property><!-- 虚拟内存检查，默认打开，修改为关闭 --><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property><!-- 容器最大CPU核数，默认4个，修改为1个 --><property><name>yarn.scheduler.maximum-allocation-vcores</name><value>1</value></property>
</configuration>

3.将以上文件打包成一个目录hadoop_config上传至服务器，再利用cp命令直接拷贝到Hadoop配置文件目录下：

cd /hadoop_config 
cp core-site.xml hdfs-site.xml hadoop-env.sh mapred-site.xml yarn-site.xml /opt/module/hadoop-3.1.3/etc/hadoop/

4.works该文件记录Hadoop集群所有从节点（HDFS的DataNode和YARN的NodeManager）的主机名，以配合使用脚本一键启动集群的从节点，打开该配置文件，先删除里面的内容（默认localhost）然后进行如下配置:

vim /opt/moudle/hadoop-3.1.3/etc/hadoop/workers

master

5.配置Hadoop环境变量在 master 服务器上配置 Hadoop 环境变量

#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

重新加载环境变量，验证是否生效:

source /etc/profile
hadoop version

四. 格式化首次启动HDFS时，必须对其进行格式化操作。

1.本质上是一些清理和准备工作，因为此时的HDFS在物理上还是不存在的。格式化只需要在 master 服务器上执行如下命令即可：

cd /opt/module/hadoop-3.1.3/
hdfs namenode -format #执行格式化动作

2.启停 Hadoop 集群


hdfs start-dfs.sh #一键启动hdfs stop-dfs.sh #一键停止

3.启停 YARN集群

start-yarn.shstop-yarn.sh

4.也可以通过如下脚本实现HDFS集群和YARN集群的启动和停止:

start-all.sh
stop-all.sh

查看全文

http://www.dtcms.com/a/601040.html

美创网站建设优势开县网站制作

北京市网站建设网站怎么盈利的

2.6、安全大脑：AI驱动的安全编排与自动化响应实战

Linux 进程间通信怎么选？——场景化决策指南

折800网站源码石家庄新闻发布会

ThreadLocal 中弱引用（WeakReference）设计：为什么要 “故意” 让 Key 被回收？

Java大厂面试真题：从Spring Boot到AI微服务的三轮技术拷问

es开源小工具 -- 分析器功能

MQTT 与双工通信

【.NET10】正式发布！微软开启智能开发生态新纪元

Linux 魔法：多种空块填充技术详解与实践

深入浅出 SQLSugar：快速掌握高效 .NET ORM 框架

广东哪家网站建网站搜索不到公司网站

做网站开发需要学什么app开发自学教程

【Linux】网络编程入门：从一个小型回声服务器开始

【统一功能处理】从入门到源码：拦截器学习指南（含适配器模式深度解读）

linux 解析并生成一个platform_device设备具体过程

编译器使用的开发语言 | 解析编译器的实现原理及其开发语言的选择

佛山企业网站建设流程织梦营销型网站模板

洛谷 P11965：[GESP202503 七级] 等价消除 ← 位运算(异或) + STL map

智慧团建网登录入口移动网站如何优化排名

linux drm子系统专栏介绍

termux编译opencv给python用

4．子任务四：Hive 安装配置

Lua学习记录（3） --- Lua中的复杂数据类型_table

郑州做定制网站的公司南宁有名的seo费用

华为SRv6技术：引领IP网络进入新时代的智能导航系统

视频汇聚平台EasyCVR：构建通信基站“可视、可管、可控”的智慧安防体系

在云手机中云计算的作用都有哪些？

绿盟防火墙机制