Hive 的安装配置见:Hive 安装配置
在 Hive 上配置 Hive on Spark
安装
在服务器 ns1 上安装,此服务器之前已经安装好 Hive;
下载解压
官网地址:http://spark.apache.org/downloads.html
下载:spark-3.0.0-bin-hadoop3.2.tgz
说明:
Hive3.1.2 支持的 Spark 是 2.4.5,所以需要将下载的 Hive3.1.2 的源码中的 pom 文件中的 Spark 版本改为 3.0.0,然后再编译打包,得到支持 Spark 3.0.0 的 Jar 包;
$ tar xzvf spark-3.0.0-bin-hadoop3.2.tgz -C /home/hadoop/local/
$ cd /home/hadoop/local
$ ln -s spark-3.0.0-bin-hadoop3.2 spark
配置环境变量
$ sudo vim /etc/profile.d/my_env.sh
HADOOP_HOME=/home/local/hadoop
ZOOKEEPER_HOME=/home/hadoop/local/zookeeper
KAFKA_HOME=/home/hadoop/local/kafka
KE_HOME=/home/hadoop/local/efak
FLUME_HOME=/home/hadoop/local/flume
SQOOP_HOME=/home/hadoop/local/sqoop
HIVE_HOME=/home/hadoop/local/hive
SPARK_HOME=/home/hadoop/local/spark
PATH=$PATH:/home/hadoop/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$KE_HOME/bin:$FLUME_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$SPARK
_HOME/bin:$SPARK_HOME/sbin
export HADOOP_HOME ZOOKEEPER_HOME KAFKA_HOME KE_HOME FLUME_HOME SQOOP_HOME HIVE_HOME SPARK_HOME PATH
配置 Hive on Spark
在 hive 中增加配置 spark-defaults.conf
$ vim /home/hadoop/local/hive/conf/spark-defaults.conf
spark.master=yarn
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://mycluster/spark/history
spark.executor.memory=2g
spark.driver.memory=2g
#spark.memory.offHeap.enabled=true
#spark.memory.offHeap.size=2g
spark.driver.extraLibraryPath=/home/local/hadoop/lib/native
spark.executor.extraLibraryPath=/home/local/hadoop/lib/native
hdfs 中 /spark/history 目录是存放 spark 历史日志的地方,要在 hadoop 的页面 http//ns2:50070 上看下这个目录是否已有,如果没有就手动新建这个文件夹;
或手动创建:
$ hdfs dfs -mkdir /spark/history
在 hive-site.conf 配置文件中添加如下几条配置
$ vim /home/hadoop/local/hive/conf/hive-site.conf
最下面添加:
<property>
<name>spark.yarn.jars</name>
<value>hdfs://mycluster/spark/jars/*</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.spark.client.connect.timeout</name>
<value>10000ms</value>
</property>
完整的配置:
<?xml version="1.0"?>
<?xml-style sheettype="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ns1:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>ns1</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://mycluster/spark/jars/*</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.spark.client.connect.timeout</name>
<value>10000ms</value>
</property>
</configuration>
向 HDFS 上传 Spark 纯净版 Jar 包
下载并解压:
$ tar -zxvf spark-3.0.0-bin-without-hadoop.tgz
上传 Spark 纯净版 jar 包到 HDFS:
$ hdfs dfs -mkdir -p /spark/jars
$ hdfs dfs -put spark-3.0.0-bin-without-hadoop/jars/* /spark/jars/
一共上传了 146 个 Jar 包;
测试
1)启动 hive 客户端
$ hive
2)创建表:
$ create table test;
$ use test;
$ create table student(id int, name string);
$ insert into student values(1001, 'zhangsan');
Query ID = hadoop_20220915174910_4ed7ce9b-b7a1-41c8-a55d-b008569fbb53
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'java.lang.Exception(Failed to submit Spark work, please retry later)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to submit Spark work, please retry later
hive> insert into student values(1001, 'zhangsan');
Query ID = hadoop_20220915175122_eaccaf93-6488-4a91-9375-75794c880b23
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Running with YARN Application = application_1663211231728_0013
Kill Command = /home/local/hadoop/bin/yarn application -kill application_1663211231728_0013
Hive on Spark Session Web UI URL: http://ns2:32777
Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-0 ........ 0 FINISHED 1 1 0 0 0
Stage-1 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.09 s
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 6.09 second(s)
Loading data to table test.student
OK
Time taken: 22.528 seconds
$ insert into student values(1002, 'lisi');
$ select * from student;
OK
1001 zhangsan
1002 lisi
Time taken: 0.174 seconds, Fetched: 2 row(s)
版权归原作者 开发老张 所有, 如有侵权,请联系我们删除。