0


Hive 上配置 Hive on Spark

Hive 的安装配置见:Hive 安装配置

在 Hive 上配置 Hive on Spark

安装

在服务器 ns1 上安装,此服务器之前已经安装好 Hive;

下载解压

官网地址:http://spark.apache.org/downloads.html

下载:spark-3.0.0-bin-hadoop3.2.tgz

说明:

Hive3.1.2 支持的 Spark 是 2.4.5,所以需要将下载的 Hive3.1.2 的源码中的 pom 文件中的 Spark 版本改为 3.0.0,然后再编译打包,得到支持 Spark 3.0.0 的 Jar 包;

$ tar xzvf spark-3.0.0-bin-hadoop3.2.tgz -C /home/hadoop/local/

$ cd /home/hadoop/local

$ ln -s spark-3.0.0-bin-hadoop3.2 spark

配置环境变量

$ sudo vim /etc/profile.d/my_env.sh

  1. HADOOP_HOME=/home/local/hadoop
  2. ZOOKEEPER_HOME=/home/hadoop/local/zookeeper
  3. KAFKA_HOME=/home/hadoop/local/kafka
  4. KE_HOME=/home/hadoop/local/efak
  5. FLUME_HOME=/home/hadoop/local/flume
  6. SQOOP_HOME=/home/hadoop/local/sqoop
  7. HIVE_HOME=/home/hadoop/local/hive
  8. SPARK_HOME=/home/hadoop/local/spark
  9. PATH=$PATH:/home/hadoop/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$KE_HOME/bin:$FLUME_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$SPARK
  10. _HOME/bin:$SPARK_HOME/sbin
  11. export HADOOP_HOME ZOOKEEPER_HOME KAFKA_HOME KE_HOME FLUME_HOME SQOOP_HOME HIVE_HOME SPARK_HOME PATH

配置 Hive on Spark

在 hive 中增加配置 spark-defaults.conf

$ vim /home/hadoop/local/hive/conf/spark-defaults.conf

  1. spark.master=yarn
  2. spark.eventLog.enabled=true
  3. spark.eventLog.dir=hdfs://mycluster/spark/history
  4. spark.executor.memory=2g
  5. spark.driver.memory=2g
  6. #spark.memory.offHeap.enabled=true
  7. #spark.memory.offHeap.size=2g
  8. spark.driver.extraLibraryPath=/home/local/hadoop/lib/native
  9. spark.executor.extraLibraryPath=/home/local/hadoop/lib/native

hdfs 中 /spark/history 目录是存放 spark 历史日志的地方,要在 hadoop 的页面 http//ns2:50070 上看下这个目录是否已有,如果没有就手动新建这个文件夹;

或手动创建:

$ hdfs dfs -mkdir /spark/history

在 hive-site.conf 配置文件中添加如下几条配置

$ vim /home/hadoop/local/hive/conf/hive-site.conf

最下面添加:

  1. <property>
  2. <name>spark.yarn.jars</name>
  3. <value>hdfs://mycluster/spark/jars/*</value>
  4. </property>
  5. <property>
  6. <name>hive.execution.engine</name>
  7. <value>spark</value>
  8. </property>
  9. <property>
  10. <name>hive.spark.client.connect.timeout</name>
  11. <value>10000ms</value>
  12. </property>

完整的配置:

  1. <?xml version="1.0"?>
  2. <?xml-style sheettype="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <property>
  5. <name>javax.jdo.option.ConnectionURL</name>
  6. <value>jdbc:mysql://ns1:3306/metastore?useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
  7. </property>
  8. <property>
  9. <name>javax.jdo.option.ConnectionDriverName</name>
  10. <value>com.mysql.cj.jdbc.Driver</value>
  11. </property>
  12. <property>
  13. <name>javax.jdo.option.ConnectionUserName</name>
  14. <value>root</value>
  15. </property>
  16. <property>
  17. <name>javax.jdo.option.ConnectionPassword</name>
  18. <value>123456</value>
  19. </property>
  20. <property>
  21. <name>hive.metastore.warehouse.dir</name>
  22. <value>/user/hive/warehouse</value>
  23. </property>
  24. <property>
  25. <name>hive.metastore.schema.verification</name>
  26. <value>false</value>
  27. </property>
  28. <property>
  29. <name>hive.server2.thrift.port</name>
  30. <value>10000</value>
  31. </property>
  32. <property>
  33. <name>hive.server2.thrift.bind.host</name>
  34. <value>ns1</value>
  35. </property>
  36. <property>
  37. <name>hive.metastore.event.db.notification.api.auth</name>
  38. <value>false</value>
  39. </property>
  40. <property>
  41. <name>spark.yarn.jars</name>
  42. <value>hdfs://mycluster/spark/jars/*</value>
  43. </property>
  44. <property>
  45. <name>hive.execution.engine</name>
  46. <value>spark</value>
  47. </property>
  48. <property>
  49. <name>hive.spark.client.connect.timeout</name>
  50. <value>10000ms</value>
  51. </property>
  52. </configuration>

向 HDFS 上传 Spark 纯净版 Jar 包

下载并解压:

$ tar -zxvf spark-3.0.0-bin-without-hadoop.tgz

上传 Spark 纯净版 jar 包到 HDFS:

$ hdfs dfs -mkdir -p /spark/jars

$ hdfs dfs -put spark-3.0.0-bin-without-hadoop/jars/* /spark/jars/

一共上传了 146 个 Jar 包;

测试

1)启动 hive 客户端

$ hive

2)创建表:

$ create table test;

$ use test;

$ create table student(id int, name string);

$ insert into student values(1001, 'zhangsan');

  1. Query ID = hadoop_20220915174910_4ed7ce9b-b7a1-41c8-a55d-b008569fbb53
  2. Total jobs = 1
  3. Launching Job 1 out of 1
  4. In order to change the average load for a reducer (in bytes):
  5. set hive.exec.reducers.bytes.per.reducer=<number>
  6. In order to limit the maximum number of reducers:
  7. set hive.exec.reducers.max=<number>
  8. In order to set a constant number of reducers:
  9. set mapreduce.job.reduces=<number>
  10. Failed to execute spark task, with exception 'java.lang.Exception(Failed to submit Spark work, please retry later)'
  11. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to submit Spark work, please retry later
  12. hive> insert into student values(1001, 'zhangsan');
  13. Query ID = hadoop_20220915175122_eaccaf93-6488-4a91-9375-75794c880b23
  14. Total jobs = 1
  15. Launching Job 1 out of 1
  16. In order to change the average load for a reducer (in bytes):
  17. set hive.exec.reducers.bytes.per.reducer=<number>
  18. In order to limit the maximum number of reducers:
  19. set hive.exec.reducers.max=<number>
  20. In order to set a constant number of reducers:
  21. set mapreduce.job.reduces=<number>
  22. Running with YARN Application = application_1663211231728_0013
  23. Kill Command = /home/local/hadoop/bin/yarn application -kill application_1663211231728_0013
  24. Hive on Spark Session Web UI URL: http://ns2:32777
  25. Query Hive on Spark job[0] stages: [0, 1]
  26. Spark job[0] status = RUNNING
  27. --------------------------------------------------------------------------------------
  28. STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
  29. --------------------------------------------------------------------------------------
  30. Stage-0 ........ 0 FINISHED 1 1 0 0 0
  31. Stage-1 ........ 0 FINISHED 1 1 0 0 0
  32. --------------------------------------------------------------------------------------
  33. STAGES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.09 s
  34. --------------------------------------------------------------------------------------
  35. Spark job[0] finished successfully in 6.09 second(s)
  36. Loading data to table test.student
  37. OK
  38. Time taken: 22.528 seconds

$ insert into student values(1002, 'lisi');

$ select * from student;

  1. OK
  2. 1001 zhangsan
  3. 1002 lisi
  4. Time taken: 0.174 seconds, Fetched: 2 row(s)
标签: hive spark 大数据

本文转载自: https://blog.csdn.net/zhy0414/article/details/126885386
版权归原作者 开发老张 所有, 如有侵权,请联系我们删除。

“Hive 上配置 Hive on Spark”的评论:

还没有评论