【大数据】hadoop3.0worker集群+flink+zeppelin+kafaka+zookeeper安装部署

摘要:
--启用hdfs垃圾桶机制,删除的数据可以在几分钟内从垃圾桶中回收--˃fs.trash.int

零、环境

0.1软件版本

hadoop3.0

java 1.8.241

flink-1.12.3

zeppelin-0.9.0-bin-all

kafka.1.1.1(详见kafka集群部署)

0.2硬件

192.168.0.24  8c32G500SSD hadoop-master

192.168.0.25 8c32G500SSD  hadoop-client-1 

192.168.0.27 8c32G500SSD  hadoop-client-2

0.3 架构方式

workers集群

一、部署初始化

1.0 各个服务器免密

# ssh-keygen -t rsa
# ssh-copy-id node01
# 保证这三个文件每个服务器一致

.ssh/
total 16
-rw-r--r--. 1 root root 400 May 7 18:46 authorized_keys
-rw-------. 1 root root 1675 May 7 18:32 id_rsa
-rw-r--r--. 1 root root 400 May 7 18:32 id_rsa.pub

1.1 java安装

1.2 修改环境变量

cat >> /etc/profile << EOF
export JAVA_HOME=/usr/java/jdk1.8.0_241-amd64/
export FLINK_HOME=/export/servers/flink-1.12.3/
export CLASSPATH=$JAVA_HOME/lib
export HADOOP_HOME=/export/servers/hadoop-3.3.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export ZEPPELIN_HOME=/export/servers/zeppelin-0.9.0-bin-all
export KAFKA_HOME=/export/servers/kafka

export ZK_HOME=/export/servers/zookeeper/
export PATH=$PATH:$JAVA_HOME/bin:$ZK_HOME/bin:

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$FLINK_HOME/bin:$ZEPPELIN_HOME/bin:$ZK_HOME/bin:$KAFKA_HOME/bin
EOF

  

1.3 检查主机名访问

#cat /etc/localhost

192.168.0.24   hadoop-master

192.168.0.25   hadoop-client-1 

192.168.0.27   hadoop-client-2

1.4下载相关软件

java 需手动上传

单独安装:rpm -ivh jdk-8u241-linux-x64.rpm

# hadoop
wget
https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
# flink
wget https://mirror-hk.koddos.net/apache/flink/flink-1.12.3/flink-1.12.3-bin-scala_2.11.tgz
# zeppelin
wget https://mirror-hk.koddos.net/apache/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-all.tgz

1.5配置时间服务器等

## 安装
yum install -y ntp

## 启动定时任务
crontab -e

## 随后在输入界面键入
*/1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;

# nc安装
yum install -y nc

# 文件夹规划
mkdir -p /export/servers    # 安装目录
mkdir -p /export/softwares  # 软件包存放目录
mkdir -p /export/scripts    # 启动脚本目录 

二、配置hadoop

2.1 解压

tar -xf hadoop-3.3.0.tar.gz -C /export/servers/
tar -xf flink-1.12.3-bin-scala_2.11.tgz -C /export/servers/
tar -xf zeppelin-0.9.0-bin-all.tgz -C /export/servers/

ll /export/servers/

2.2核对环境变量

cat  /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_241-amd64/
export FLINK_HOME=/export/servers/flink-1.12.3/
export CLASSPATH=$JAVA_HOME/lib
export HADOOP_HOME=/export/servers/hadoop-3.3.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export ZEPPELIN_HOME=/export/servers/zeppelin-0.9.0-bin-all
export KAFKA_HOME=/export/servers/kafka

export ZK_HOME=/export/servers/zookeeper/
export PATH=$PATH:$JAVA_HOME/bin:$ZK_HOME/bin:

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$FLINK_HOME/bin:$ZEPPELIN_HOME/bin:$ZK_HOME/bin:$KAFKA_HOME/bin

2.3配置hadoop

cd /export/servers/hadoop-3.3.0/
vim etc/hadoop/core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop-master:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/export/servers/hadoop-3.3.0/hadoopDatas/tempDatas</value> </property> <!-- 缓冲区大小,实际工作中根据服务器性能动态调整 --> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> <!-- 开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟 --> <property> <name>fs.trash.interval</name> <value>10080</value> </property> </configuration>


# vim sbin/start-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

 # vim sbin/stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

# vim sbin/start-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HDFS_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

# vim sbin/stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HDFS_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

# vim etc/hadoop/hdfs-site.xml

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>

<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-master:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/namenodeDatas,file:///export/servers/hadoop-3.3.0/hadoopDatas/namenodeDatas2</value>
</property>
<!-- 定义dataNode数据存储的节点位置,实际工作中,一般先确定磁盘的挂载目录,然后多个目录用,进行分割 -->

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/datanodeDatas,file:///export/servers/hadoop-3.3.0/hadoopDatas/datanodeDatas2</value>
</property>

<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/nn/edits</value>
</property>

<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/snn/name</value>
</property>

<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/dfs/snn/edits</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>


<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>

# vim etc/hadoop/hdfs-site.xml

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-master:50090</value>
</property>

<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-master:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/namenodeDatas,file:///export/servers/hadoop-3.3.0/hadoopDatas/namenodeDatas2</value>
</property>
<!-- 定义dataNode数据存储的节点位置,实际工作中,一般先确定磁盘的挂载目录,然后多个目录用,进行分割 -->

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/datanodeDatas,file:///export/servers/hadoop-3.3.0/hadoopDatas/datanodeDatas2</value>
</property>

<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/nn/edits</value>
</property>

<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/snn/name</value>
</property>

<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///export/servers/hadoop-3.3.0/hadoopDatas/dfs/snn/edits</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>


<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>

# vim etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.8.0_241-amd64/
export HADOOP_SSH_OPTS="-p 32539"

# vim etc/hadoop/yarn-site.xml

<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>32</value>
<description>该节点上Yarn可使用的CPU个数</description>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>单任务可申请的最小虚拟CPU个数</description>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>4</value>
<description>单任务可申请的最大虚拟CPU个数</description>
</property>


<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>52000</value>
<description>该节点上Yarn可使用的物理内存</description>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
<description>单任务可申请的最小物理内存</description>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>640000</value>
<description>单任务可申请的最大物理内存</description>
</property>

# vim mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

# vim etc/hadoop/workers
hadoop-client-1
hadoop-client-2

# source /etc/profile

 2.4启动并验证hadoop

# 主服务器
hadoop namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh

# 主

# jps
13650 Jps
11308 SecondaryNameNode
11645 ResourceManager
11006 NameNode

# worker2

# jps
7561 DataNode
7740 NodeManager
9487 Jps

# jps
24664 Jps
22748 DataNode
22925 NodeManager

 

2.5配置flink集群和zeppelin

# flink
# vim conf/flink-conf.yaml

jobmanager.rpc.address: 192.168.0.24
jobmanager.rpc.port: 6123
jobmanager.memory.process.size: 14g
taskmanager.memory.process.size: 16g
taskmanager.numberOfTaskSlots: 16
parallelism.default: 2
jobmanager.execution.failover-strategy: region
rest.port: 8081
taskmanager.memory.network.fraction: 0.15
taskmanager.memory.network.min: 128mb
taskmanager.memory.network.max: 2gb

rest.bind-port: 50100-50200



# zeppelin
# cd
/export/servers/zeppelin-0.9.0-bin-all
# vim conf/zeppelin-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_241-amd64/
export USE_HADOOP=true
export ZEPPELIN_ADDR=192.168.0.24
export ZEPPELIN_PORT=8082
export ZEPPELIN_LOCAL_IP=192.168.0.24
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=8"
export ZEPPELIN_MEM="-Xms1024m -Xmx4096m -XX:MaxMetaspaceSize=512m"
export HADOOP_CONF_DIR=/export/servers/hadoop-3.3.0/etc/hadoop
export ZEPPELIN_INTERPRETER_OUTPUT_LIMIT=2500000

# vim conf/shiro.ini
admin = password1, admin

2.6启动并验证flink集群和zeppelin

# 启动命令
bin/yarn-session.sh -tm 2048 -s 4 -d

bin/zeppelin-daemon.sh start

 三、使用yarn模式调试

http://<ip>:8082/#/interpreter

【大数据】hadoop3.0worker集群+flink+zeppelin+kafaka+zookeeper安装部署第1张

 flink.conf

%flink.conf
flink.execution.mode yarn

heartbeat.timeout 180000

flink.execution.packages org.apache.flink:flink-connector-jdbc_2.11:1.12.0,mysql:mysql-connector-java:8.0.16,org.apache.flink:flink-sql-connector-kafka_2.11:1.12.0,org.apache.flink:flink-sql-connector-elasticsearch7_2.12:1.12.1

table.exec.source.cdc-events-duplicate true
taskmanager.memory.task.off-heap.size 512MB
table.exec.mini-batch.enabled true
table.exec.mini-batch.allow-latency 5000
table.exec.mini-batch.size 50000

flink.jm.memory 2048
flink.tm.memory 4096
flink.tm.slot 2

flink.yarn.appName t_apibet_report

书写%flink.ssql

http://<ip>:8081调试

复杂表还是要拆表,拿yarn直接跑

四、实用命令

# jps
24642 Kafka
2818 RemoteInterpreterServer
1988 YarnSessionClusterEntrypoint
24452 YarnTaskExecutorRunner
23688 ZeppelinServer
21448 Jps
10952 NameNode
3530 YarnSessionClusterEntrypoint
20365 RemoteInterpreterServer
18961 NodeManager
11155 DataNode
28247 RemoteInterpreterServer
3864 YarnTaskExecutorRunner
32155 RemoteInterpreterServer
30491 RemoteInterpreterServer
13404 YarnTaskExecutorRunner
15964 YarnTaskExecutorRunner
21084 YarnSessionClusterEntrypoint
414 YarnSessionClusterEntrypoint
18783 ResourceManager
24040 QuorumPeerMain
26152 YarnTaskExecutorRunner
29160 YarnSessionClusterEntrypoint
14825 YarnTaskExecutorRunner
25322 CanalLauncher
24746 RemoteInterpreterServer
31214 YarnSessionClusterEntrypoint
25839 YarnSessionClusterEntrypoint
25136 CanalAdminApplication
1267 RemoteInterpreterServer
31604 YarnTaskExecutorRunner
11515 SecondaryNameNode


# yarn app -list

# yarn app -kill <Application-Id>

免责声明:文章转载自《【大数据】hadoop3.0worker集群+flink+zeppelin+kafaka+zookeeper安装部署》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇java中byte, int的转换SSE:服务器推送事件下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

elasticsearch之python备份

一:elasticsearch原理 Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎。无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。 但是,Lucene只是一个库。想要使用它,你必须使用Java来作为开发语言并将其直接集成到你的应用中,更糟糕的是,Lucene非常复杂,...

RocketMQ双主双从集群搭建

1 各角色介绍 Producer:消息的发送者;举例:发信者 Consumer:消息接收者;举例:收信者 Broker:暂存和传输消息;举例:邮局 NameServer:管理Broker;举例:各个邮局的管理机构 Topic:区分消息的种类;一个发送者可以发送消息给一个或者多个Topic;一个消息的接收者可以订阅一个或者多个Topic消息 M...

Hadoop学习笔记—16.Pig框架学习

Hadoop学习笔记—16.Pig框架学习 一、关于Pig:别以为猪不能干活 1.1 Pig的简介 Pig是一个基于Hadoop的大规模数据分析平台,它提供的SQL-LIKE语言叫Pig Latin,该语言的编译器会把类SQL的数据分析请求转换为一系列经过优化处理的MapReduce运算。Pig为复杂的海量数据并行计算提供了一个简单的操作和编程接口。 C...

想高效学会Hadoop,你要按照这个路线

学习hadoop,首先我们要知道hadoop是什么? 说到底Hadoop只是一项分布式系统的工具,我们在学习的时候要理解分布式系统设计中的原则以及方法,只有这样才能以不变应万变。再一个就是一定要动手,有什么案例,有什么项目一定要亲自动手去敲。 学习的时候不要害怕遇到问题,问题是最好的老师。其实学习的过程就是逐渐解决问题的过程,当你遇到的问题越来越少的时候,...

Go -- etcd详解(转)

CoreOS是一个基于Docker的轻量级容器化Linux发行版,专为大型数据中心而设计,旨在通过轻量的系统架构和灵活的应用程序部署能力简化数据中心的维护成本和复杂度。CoreOS作为Docker生态圈中的重要一员,日益得到各大云服务商的重视,目前已经完成了A轮融资,发展风头正劲。InfoQ希望《CoreOS实战》系列文章能够帮助读者了解CoreOS以及相...

Hadoop企业级应用

  Hadoop专业解决方案之构建Hadoop企业级应用   一、大数据的挑战   大数据面对挑战是你必须重新思考构建数据分析应用的方式。传统方式的应用构建是基于数据存储在不支持大数据处理的基础之上。这主要是因为一下原因:   1.传统应用的基础设施是基于传统数据库访问模式设计的,它不支持Hadoop;   2.数据存储在Hadoop之上,实时访问集群...