0


Hive 上配置 Hive on Spark

Hive 的安装配置见:Hive 安装配置

在 Hive 上配置 Hive on Spark

安装

在服务器 ns1 上安装,此服务器之前已经安装好 Hive;

下载解压

官网地址:http://spark.apache.org/downloads.html

下载:spark-3.0.0-bin-hadoop3.2.tgz

说明:

Hive3.1.2 支持的 Spark 是 2.4.5,所以需要将下载的 Hive3.1.2 的源码中的 pom 文件中的 Spark 版本改为 3.0.0,然后再编译打包,得到支持 Spark 3.0.0 的 Jar 包;

$ tar xzvf spark-3.0.0-bin-hadoop3.2.tgz -C /home/hadoop/local/

$ cd /home/hadoop/local

$ ln -s spark-3.0.0-bin-hadoop3.2 spark

配置环境变量

$ sudo vim /etc/profile.d/my_env.sh

HADOOP_HOME=/home/local/hadoop                                                                                                                                       
ZOOKEEPER_HOME=/home/hadoop/local/zookeeper                                                                                                                          
KAFKA_HOME=/home/hadoop/local/kafka                                                                                                                                  
KE_HOME=/home/hadoop/local/efak                                                                                                                                      
FLUME_HOME=/home/hadoop/local/flume                                                                                                                                  
SQOOP_HOME=/home/hadoop/local/sqoop                                                                                                                                  
HIVE_HOME=/home/hadoop/local/hive                                                                                                                                    
SPARK_HOME=/home/hadoop/local/spark                                                                                                                                  
PATH=$PATH:/home/hadoop/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$KE_HOME/bin:$FLUME_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$SPARK
_HOME/bin:$SPARK_HOME/sbin                                                                                                                                           
export HADOOP_HOME ZOOKEEPER_HOME KAFKA_HOME KE_HOME FLUME_HOME SQOOP_HOME HIVE_HOME SPARK_HOME PATH   

配置 Hive on Spark

在 hive 中增加配置 spark-defaults.conf

$ vim /home/hadoop/local/hive/conf/spark-defaults.conf

spark.master=yarn                                                                                                                                                    
spark.eventLog.enabled=true                                                                                                                                          
spark.eventLog.dir=hdfs://mycluster/spark/history                                                                                                                     
spark.executor.memory=2g                                                                                                                                             
spark.driver.memory=2g
#spark.memory.offHeap.enabled=true
#spark.memory.offHeap.size=2g
spark.driver.extraLibraryPath=/home/local/hadoop/lib/native
spark.executor.extraLibraryPath=/home/local/hadoop/lib/native

hdfs 中 /spark/history 目录是存放 spark 历史日志的地方,要在 hadoop 的页面 http//ns2:50070 上看下这个目录是否已有,如果没有就手动新建这个文件夹;

或手动创建:

$ hdfs dfs -mkdir /spark/history

在 hive-site.conf 配置文件中添加如下几条配置

$ vim /home/hadoop/local/hive/conf/hive-site.conf

最下面添加:

<property>
   <name>spark.yarn.jars</name>
   <value>hdfs://mycluster/spark/jars/*</value>
</property>

<property>
   <name>hive.execution.engine</name>
    <value>spark</value>
</property>

<property>
   <name>hive.spark.client.connect.timeout</name>
    <value>10000ms</value>
</property>

完整的配置:

<?xml version="1.0"?>
<?xml-style sheettype="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
       <name>javax.jdo.option.ConnectionURL</name>
       <value>jdbc:mysql://ns1:3306/metastore?useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
   </property>

   <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>com.mysql.cj.jdbc.Driver</value>
   </property>

   <property>
       <name>javax.jdo.option.ConnectionUserName</name>
       <value>root</value>
   </property>

   <property>
       <name>javax.jdo.option.ConnectionPassword</name>
       <value>123456</value>
   </property>

   <property>
       <name>hive.metastore.warehouse.dir</name>
       <value>/user/hive/warehouse</value>
   </property>

   <property>
       <name>hive.metastore.schema.verification</name>
       <value>false</value>
   </property>

   <property>
   <name>hive.server2.thrift.port</name>
   <value>10000</value>
   </property>

   <property>
       <name>hive.server2.thrift.bind.host</name>
       <value>ns1</value>
   </property>

   <property>
       <name>hive.metastore.event.db.notification.api.auth</name>
       <value>false</value>
   </property>
   
   <property>
       <name>spark.yarn.jars</name>
       <value>hdfs://mycluster/spark/jars/*</value>
   </property>

   <property>
       <name>hive.execution.engine</name>
       <value>spark</value>
   </property>

   <property>
       <name>hive.spark.client.connect.timeout</name>
       <value>10000ms</value>
   </property>
    
</configuration>

向 HDFS 上传 Spark 纯净版 Jar 包

下载并解压:

$ tar -zxvf spark-3.0.0-bin-without-hadoop.tgz

上传 Spark 纯净版 jar 包到 HDFS:

$ hdfs dfs -mkdir -p /spark/jars

$ hdfs dfs -put spark-3.0.0-bin-without-hadoop/jars/* /spark/jars/

一共上传了 146 个 Jar 包;

测试

1)启动 hive 客户端

$ hive

2)创建表:

$ create table test;

$ use test;

$ create table student(id int, name string);

$ insert into student values(1001, 'zhangsan');

Query ID = hadoop_20220915174910_4ed7ce9b-b7a1-41c8-a55d-b008569fbb53
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'java.lang.Exception(Failed to submit Spark work, please retry later)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to submit Spark work, please retry later
hive> insert into student values(1001, 'zhangsan');
Query ID = hadoop_20220915175122_eaccaf93-6488-4a91-9375-75794c880b23
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1663211231728_0013
Kill Command = /home/local/hadoop/bin/yarn application -kill application_1663211231728_0013
Hive on Spark Session Web UI URL: http://ns2:32777

Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      1          1        0        0       0  
Stage-1 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 6.09 s     
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 6.09 second(s)
Loading data to table test.student
OK
Time taken: 22.528 seconds

$ insert into student values(1002, 'lisi');

$ select * from student;

OK                                                                                                                                                                   
1001    zhangsan                                                                                                                                                     
1002    lisi                                                                                                                                                         
Time taken: 0.174 seconds, Fetched: 2 row(s)  
标签: hive spark 大数据

本文转载自: https://blog.csdn.net/zhy0414/article/details/126885386
版权归原作者 开发老张 所有, 如有侵权,请联系我们删除。

“Hive 上配置 Hive on Spark”的评论:

还没有评论