Hive MetaStore升级调研
文章目录
MetaStore升级的主要部分是对存储媒介mysql进行schema进行升级。
升级步骤
- 关闭MetaStore实例并限制对MetaStore MySQL数据库的访问。在执行schema升级时,不要让其他人访问或修改数据库的内容,这一点非常重要。【停止元数据服务,代表升级期间,关于hive的服务均不可用。】
- 创建MySQL metastore数据库的备份。如果出现问题,这将允许你恢复在升级过程中所做的任何更改。mysqldump工具是创建MySQL数据库备份最简单的方法:【备份Mysql(元数据服务库)数据】
> mysqldump --opt <metastore_db_name>> metastore_backup.sql
注意,你可能还需要使用–host和–user命令行开关指定主机名和用户名。 - 将metastore数据库schema转储到文件中。我们再次使用mysqldump工具程序,但这次使用命令行选项,指定我们只对转储创建schema所需的DDL语句:【备份Mysql(元数据服务库)的schema】
> mysqldump --skip-add-drop-table --no-data <metastore_db_name>> my-schema-x.y.z.mysql.sql
- schema升级脚本假定你正在升级的schema与你的特定版本Hive的官方schema非常匹配。该目录下的文件名如
hive-schema-x.y.z.mysql.sql
包含Hive每个发布版本对应的官方schema的备份。你可以通过将官方转储的内容与上一步中创建的schema备份的内容进行区分,来确定你的schema与官方schema之间的差异。有些差异是可以接受的,不会干扰升级过程,但其他差异需要手动解决,否则升级脚本将无法完成。- 表缺少:Hive的默认配置导致MetaStore只在需要时创建schema元素。如果你没有创建相应的Hive目录对象,一些表可能会从MetaStore schema中丢失,例如,如果你没有在MetaStore中创建任何表分区,那么PARTITIONS
表可能不存在。你必须在运行升级脚本之前创建这些缺失的表。最简单的方法是针对schema执行正式的schemaDDL脚本。schema脚本中的每个CREATE TABLE语句都包含一个IF NOT EXISTS子句,因此schema中已经存在的表将被忽略,而不存在的表将被创建。【升级脚本不会主动创建和补充未使用的元素,执行升级脚本之前,需要先执行对应版本的DDL脚本用来创建表,确保表不会缺失。】- 额外的表:schema可能包括一个名为NUCLEUS_TABLES
的表或一个名为SEQUENCE_TABLE
的表。这些表由DataNucleus ORM层管理,如果它们不存在,将自动创建。你不需要采取任何行动。【可以忽略,若执行建表sql,则必然包含上述两张表】- 同一表中相反的列约束名称:具有多个约束的表可能具有反向的约束名称。例如,PARTITIONS
表包含两个外键约束,分别名为PARTITIONS_FK1
和PARTITIONS_FK2
,它们分别引用SDS.SD_ID
和TBLS.TBL_ID
。但是,在你的schema中,你可能会发现PARTITIONS_FK1
引用TBLS.TBL_ID
和PARTITIONS_FK2
引用SDS.SD_ID
。任何一个版本都是可以接受的——唯一的要求是这些约束确实存在。【列约束名称的引用可以不同,但是需要约束要完整】#mermaid-svg-UEkMBHlFAKYWSnrd {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-UEkMBHlFAKYWSnrd .error-icon{fill:#552222;}#mermaid-svg-UEkMBHlFAKYWSnrd .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-UEkMBHlFAKYWSnrd .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-UEkMBHlFAKYWSnrd .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-UEkMBHlFAKYWSnrd .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-UEkMBHlFAKYWSnrd .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-UEkMBHlFAKYWSnrd .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-UEkMBHlFAKYWSnrd .marker{fill:#333333;stroke:#333333;}#mermaid-svg-UEkMBHlFAKYWSnrd .marker.cross{stroke:#333333;}#mermaid-svg-UEkMBHlFAKYWSnrd svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-UEkMBHlFAKYWSnrd .entityBox{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-UEkMBHlFAKYWSnrd .attributeBoxOdd{fill:#ffffff;stroke:#9370DB;}#mermaid-svg-UEkMBHlFAKYWSnrd .attributeBoxEven{fill:#f2f2f2;stroke:#9370DB;}#mermaid-svg-UEkMBHlFAKYWSnrd .relationshipLabelBox{fill:hsl(80, 100%, 96.2745098039%);opacity:0.7;background-color:hsl(80, 100%, 96.2745098039%);}#mermaid-svg-UEkMBHlFAKYWSnrd .relationshipLabelBox rect{opacity:0.5;}#mermaid-svg-UEkMBHlFAKYWSnrd .relationshipLine{stroke:#333333;}#mermaid-svg-UEkMBHlFAKYWSnrd :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} PARTITIONS bigint PART_ID PK bigint SD_ID FK SDS表的SD_ID字段 bigint TBL_ID FK TBLS表的TBL_ID字段 SDS bigint SD_ID PK TBLS bigint TBL_ID PK 多对一 PARTITIONS_FK1 一对一 PARTITIONS_FK2 #mermaid-svg-fGzwbKO7shK1sU0w {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-fGzwbKO7shK1sU0w .error-icon{fill:#552222;}#mermaid-svg-fGzwbKO7shK1sU0w .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-fGzwbKO7shK1sU0w .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-fGzwbKO7shK1sU0w .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-fGzwbKO7shK1sU0w .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-fGzwbKO7shK1sU0w .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-fGzwbKO7shK1sU0w .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-fGzwbKO7shK1sU0w .marker{fill:#333333;stroke:#333333;}#mermaid-svg-fGzwbKO7shK1sU0w .marker.cross{stroke:#333333;}#mermaid-svg-fGzwbKO7shK1sU0w svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-fGzwbKO7shK1sU0w .entityBox{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-fGzwbKO7shK1sU0w .attributeBoxOdd{fill:#ffffff;stroke:#9370DB;}#mermaid-svg-fGzwbKO7shK1sU0w .attributeBoxEven{fill:#f2f2f2;stroke:#9370DB;}#mermaid-svg-fGzwbKO7shK1sU0w .relationshipLabelBox{fill:hsl(80, 100%, 96.2745098039%);opacity:0.7;background-color:hsl(80, 100%, 96.2745098039%);}#mermaid-svg-fGzwbKO7shK1sU0w .relationshipLabelBox rect{opacity:0.5;}#mermaid-svg-fGzwbKO7shK1sU0w .relationshipLine{stroke:#333333;}#mermaid-svg-fGzwbKO7shK1sU0w :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} PARTITIONS bigint PART_ID PK bigint SD_ID FK SDS表的SD_ID字段 bigint TBL_ID FK TBLS表的TBL_ID字段 SDS bigint SD_ID PK TBLS bigint TBL_ID PK 多对一 PARTITIONS_FK2 一对一 PARTITIONS_FK1 - 列/约束名称的差异:你的schema可能包含列名为IDX
或唯一键名为unique <tab_name>
的表。如果在schema中发现了这两种情况,则需要在运行升级脚本之前将其名称更改为INTEGER_IDX
和UNIQUE_<tab_name>
。有关此问题的更多背景信息,请参阅hive-1435。【UNIQUE_
开头有点疑问了,官方的ddl里面就存在非UNIQUE_
开头的唯一键】 - 现在可以运行schema升级脚本了。【如果你要从Hive 0.5.0升级到Hive 0.6.0,你需要运行
upgrade-0.5.0-to-0.6.0.mysql.SQL
脚本,但是如果要从0.5.0升级到0.7.0,则需要先运行0.5.0到0.6.0升级脚本,然后再运行0.6.0到0.7.0升级脚本。】【不支持跨大版本升级,需要按顺序执行升级脚本】> mysql --verbosemysql> use <metastore_db_name>;Database changedmysql>source upgrade-1.2.0-to-2.0.0.mysql.sqlmysql>source upgrade-2.0.0-to-2.1.0.mysql.sqlmysql>source upgrade-2.1.0-to-2.2.0.mysql.sqlmysql>source upgrade-2.2.0-to-2.3.0.mysql.sql
这些脚本应该运行到没有任何错误。如果确实遇到错误,则需要分析原因,并尝试将其追溯到前面的步骤之一。 - 升级过程的最后一步是根据Hive特定版本的官方schema验证新升级的schema。这是通过**重复步骤(3)和(4)**来完成的,但这次是与升级后的schema的正式版本进行比较,例如,如果你将schema升级到Hive 0.7.0,那么你将需要将你的schema备份与
hive-schema-0.7.0.mysql.sql
的内容进行比较。【将1.2.0升级到2.0.0之后。备份schema,将升级到2.0.0的schema与官方直接的schema2.0.0进行比对,若无问题,再将2.0.0升级到2.1.0,再与官方直接的schema2.1.0比对,再将2.1.0升级到2.2.0,一步步升级到2.3.0。】
脚本说明
脚本来源:hive 2.3.4源码
metastore/scripts/upgrade/mysql/
下
官方直接schema DDL脚本有两类
hive-schema-a.b.c.mysql.sql
和
hive-txn-schema-a.b.c.mysql.sql
,例如
hive-schema-2.3.0.mysql.sql
和
hive-txn-schema-2.3.0.mysql.sql
。
upgrade脚本主要是去执行
XXX-HIVE-XXXXX.mysql.sql
类的脚本,去逐个分析每个脚本里里面的操作。如下:
SELECT'Upgrading MetaStore schema from 1.2.0 to 2.0.0'AS' ';
SOURCE 021-HIVE-7018.mysql.sql;
SOURCE 022-HIVE-11970.mysql.sql;
SOURCE 023-HIVE-12807.mysql.sql;
SOURCE 024-HIVE-12814.mysql.sql;
SOURCE 025-HIVE-12816.mysql.sql;
SOURCE 026-HIVE-12818.mysql.sql;
SOURCE 027-HIVE-12819.mysql.sql;
SOURCE 028-HIVE-12821.mysql.sql;
SOURCE 029-HIVE-12822.mysql.sql;
SOURCE 030-HIVE-12823.mysql.sql;
SOURCE 031-HIVE-12831.mysql.sql;
SOURCE 032-HIVE-12832.mysql.sql;UPDATE VERSION SET SCHEMA_VERSION='2.0.0', VERSION_COMMENT='Hive release version 2.0.0'where VER_ID=1;SELECT'Finished upgrading MetaStore schema from 1.2.0 to 2.0.0'AS' ';
若在执行升级脚本时出现错误:
- 方式一:还原schema后修复问题后,重新执行升级脚本。
- 方式二:根据升级脚本的语句,手动修复达到官方直接脚本的效果。
升级其他项
除了升级mysql之外的升级操作
配置项
除了升级mysql之外,还需要对比配置项,新增了什么项,删除了什么项,需要分析设置合适的值。
配置类
org.apache.hadoop.hive.conf.HiveConf
配置文件
hivemetastore-site.xml
,
hive-site.xml
Hive官网配置项说明Hive Configuration Properties
原文
文章来源:hive 2.3.4源码
metastore/scripts/upgrade/mysql/README
This document describes how to upgrade the schema of a MySQL backed Hive MetaStore instance from one release version of Hive to another release version of Hive. For example, by following the steps listed below it is possible to upgrade a Hive 0.5.0 MetaStore schema to a Hive 0.7.0 MetaStore schema. Before attempting this project we strongly recommend that you read through all of the steps in this document and familiarize yourself with the required tools.
MetaStore Upgrade Steps
- Shutdown your MetaStore instance and restrict access to the MetaStore’s MySQL database. It is very important that no one else accesses or modifies the contents of database while you are performing the schema upgrade.
- Create a backup of your MySQL metastore database. This will allow you to revert any changes made during the upgrade process if something goes wrong. The mysqldump utility is the easiest way to create a backup of a MySQL database:
% mysqldump --opt <metastore_db_name>> metastore_backup.sql
Note that you may need also need to specify a hostname and username using the --host and --user command line switches. - Dump your metastore database schema to a file. We use the mysqldump utility again, but this time with a command line option that specifies we are only interested in dumping the DDL statements required to create the schema:
% mysqldump --skip-add-drop-table --no-data <metastore_db_name>> my-schema-x.y.z.mysql.sql
- The schema upgrade scripts assume that the schema you are upgrading closely matches the official schema for your particular version of Hive. The files in this directory with names like “hive-schema-x.y.z.mysql.sql” contain dumps of the official schemas corresponding to each of the released versions of Hive. You can determine differences between your schema and the official schema by diffing the contents of the official dump with the schema dump you created in the previous step. Some differences are acceptable and will not interfere with the upgrade process, but others need to be resolved manually or the upgrade scripts will fail to complete.- Missing Tables: Hive’s default configuration causes the MetaStore to create schema elements only when they are needed. Some tables may be missing from your MetaStore schema if you have not created the corresponding Hive catalog objects, e.g. the PARTITIONS table will probably not exist if you have not created any table partitions in your MetaStore. You MUST create these missing tables before running the upgrade scripts. The easiest way to do this is by executing the official schema DDL script against your schema. Each of the CREATE TABLE statements in the schema script include an IF NOT EXISTS clause, so tables which already exist in your schema will be ignored, and those which don’t exist will get created.- Extra Tables: Your schema may include a table named NUCLEUS_TABLES or a table named SEQUENCE_TABLE. These tables are managed by the DataNucleus ORM layer and will be created automatically if they don’t exist. No action on your part is required.- Reversed Column Constraint Names in the Same Table: Tables with multiple constraints may have the names of the constraints reversed. For example, the PARTITIONS table contains two foreign key constraints named PARTITIONS_FK1 and PARTITIONS_FK2 which reference SDS.SD_ID and TBLS.TBL_ID respectively. However, in your schema you may find that PARTITIONS_FK1 references TBLS.TBL_ID and PARTITIONS_FK2 references SDS.SD_ID. Either version is acceptable – the only requirement is that these constraints actually exist.- Differences in Column/Constraint Names: Your schema may contain tables with columns named “IDX” or unique keys named “UNIQUE<tab_name>”. If you find either of these in your schema you will need to change the names to “INTEGER_IDX” and “UNIQUE_<tab_name>” before running the upgrade scripts. For more background on this issue please refer to HIVE-1435.
- You are now ready to run the schema upgrade scripts. If you are upgrading from Hive 0.5.0 to Hive 0.6.0 you need to run the upgrade-0.5.0-to-0.6.0.mysql.sql script, but if you are upgrading from 0.5.0 to 0.7.0 you will need to run the 0.5.0 to 0.6.0 upgrade script followed by the 0.6.0 to 0.7.0 upgrade script.
% mysql --verbosemysql> use <metastore_db_name>;Database changedmysql>source upgrade-0.5.0-to-0.6.0.mysql.sqlmysql>source upgrade-0.6.0-to-0.7.0.mysql.sql
These scripts should run to completion without any errors. If you do encounter errors you need to analyze the cause and attempt to trace it back to one of the preceding steps. - The final step of the upgrade process is validating your freshly upgraded schema against the official schema for your particular version of Hive. This is accomplished by repeating steps (3) and (4), but this time comparing against the official version of the upgraded schema, e.g. if you upgraded the schema to Hive 0.7.0 then you will want to compare your schema dump against the contents of hive-schema-0.7.0.mysql.sql
版权归原作者 顧棟 所有, 如有侵权,请联系我们删除。