十二、伪分布式配置
一、 Hadoop 环境变量文件(hadoop - env.sh)
该文件主要是配置 Hadoop 的环境变量。
在此文件的末尾添加 JAVA_HOME
的目录,
定义 HDFS 和 YARN 的相关角色用户为 root
用户。
执行以下命令进行操作:
cd $HADOOP_HOME/etc/hadoop
vim hadoop-env.sh
在文末加上以下配置:
export JAVA_HOME=/opt/module/jdk1.8.0_281/
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
二、核心配置文件(core - site.xml)
此文件是 Hadoop 核心的配置项,
例如配置 HDFS 和 MapReduce 常用的 I/O 设置等。
执行命令:
cd $HADOOP_HOME/etc/hadoop
vim core-site.xml
文件内容修改如下:
<configuration>
<!-- 指定 NameNode 地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定 Hadoop 数据存放目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.2.1/tmp</value>
</property>
</configuration>
core-site.xml
文件修改内容,如图 3.19 所示。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定NameNode地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定hadoop数据存放目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
图 3.19 core - site.xml 文件
在以上配置信息中,
必须要配置
hadoop.tmp.dir
参数。
如果没有配置,则系统会使用默认的目录为其临时目录。
该默认目录在每次系统重启后会被删除,必须重新执行 Hadoop 文件系统格式化命令,否则运行 Hadoop 就会报错。
三、 HDFS 配置文件(hdfs - site.xml)
1、此文件主要配置:
NameNode 与 secondaryNameNode 的访问地址;
NameNode 与 DataNode 数据的存放路径;
FSImage、Edits、Checkpoint 的存放位置;
设置文件的副本数,一份文件保存多少份;
设置文件存储的 block 块大小 128M。
2、执行命令:
cd $HADOOP_HOME/etc/hadoop
vim hdfs-site.xml
文件内容如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hdfs-site.xml
文件修改内容如图 3.20 所示。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<!-- 伪分布式只能设置成1 -->
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<value>1</value>——伪分布式智能设置为1
图 3.20 hdfs - site.xml 文件
四、 YARN 配置文件(yarn - site.xml)
此文件是 YARN 守护进程的配置项,
包括资源管理器、web 应用代理服务器和节点管理器等。
执行命令:
cd $HADOOP_HOME/etc/hadoop
vim yarn-site.xml
文件内容如下:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
yarn-site.xml
文件修改内容如图 3.21 所示。
<?xml version="1.0"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
图 3.21 yarn - site.xml 文件
五、MapReduce 配置文件(mapred - site.xml)
此文件是 MapReduce 守护进程的配置项,包括作业历史服务器等。
执行命令:
cd $HADOOP_HOME/etc/hadoop
vim mapred-site.xml
文件内容如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
mapred-site.xml
文件修改内容如图 3.22 所示。
<!--Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
图 3.22 mapred - site.xml 文件