0


Hive 分区表新增字段 cascade

背景

在以前上线的分区表中新加一个字段,并且要求添加到指定的位置列。

模拟测试

加 cascade 操作

  • 创建测试表
createtableifnotexists sqltest.table_add_column_test(
  org_col1 string comment'原始数据1',
  org_col2 string comment'原始数据2')comment'增加分区表字段的测试表'
partitioned by(dt string comment'分区日期');
  • 插入测试数据
insertintotable sqltest.table_add_column_test partition(dt='20230313')values('org_col1_0313','org_col2_0313');insertintotable sqltest.table_add_column_test partition(dt='20230314')values('org_col1_0314','org_col2_0314');
  • 查看现有数据
select*from table_add_column_test;
+---------------------------------+---------------------------------+---------------------------+--+| table_add_column_test.org_col1  | table_add_column_test.org_col2  | table_add_column_test.dt  |+---------------------------------+---------------------------------+---------------------------+--+| org_col1_0313                   | org_col2_0313                   | 20230313                  || org_col1_0314                   | org_col2_0314                   | 20230314                  |+---------------------------------+---------------------------------+---------------------------+--+
  • 官网添加列的语法
ALTERTABLE table_name 
  [PARTITION partition_spec]-- (Note: Hive 0.14.0 and later)ADD|REPLACECOLUMNS(col_name data_type [COMMENT col_comment],...)[CASCADE|RESTRICT]-- (Note: Hive 1.1.0 and later)

注意: Hive 1.1.0 中有

CASCADE|RESTRICT

子句。

ALTER TABLE ADD|REPLACE COLUMNS CASCADE

命令修改表元数据的列,并将相同的更改级联到所有分区元数据。

RESTRICT

是默认值,即不修改元数据。

  • 增加一列,指定增加到原始的两列中间 先添加一列(注意: 必须添加 cascade 关键字,不然不会刷新旧分区数据,关键字 cascade 能修改元数据)
altertable sqltest.table_add_column_test addcolumns(added_column string comment'新添加的列')cascade;

查看数据

+---------------------------------+---------------------------------+-------------------------------------+---------------------------+--+| table_add_column_test.org_col1  | table_add_column_test.org_col2  | table_add_column_test.added_column  | table_add_column_test.dt  |+---------------------------------+---------------------------------+-------------------------------------+---------------------------+--+| org_col1_0313                   | org_col2_0313                   | NULL                                | 20230313                  || org_col1_0314                   | org_col2_0314                   | NULL                                | 20230314                  |+---------------------------------+---------------------------------+-------------------------------------+---------------------------+--+

再对列进行排序(注意: 必须添加

cascade

关键字,不然不会刷新旧分区数据,关键字

cascade

能修改元数据)

altertable sqltest.table_add_column_test change column added_column added_column string after org_col1 cascade;

再查看数据(注意: 虽然列名顺序变了,但 HDFS 文件内容并没有变化,所以结果第二列还是有数据,第三列没数据)

+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| table_add_column_test.org_col1  | table_add_column_test.added_column  | table_add_column_test.org_col2  | table_add_column_test.dt  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| org_col1_0313                   | org_col2_0313                       | NULL                            | 20230313                  || org_col1_0314                   | org_col2_0314                       | NULL                            | 20230314                  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+
  • 重刷旧分区数据(将以前第二列放到第三列位置,现第二列为新数据)
insert overwrite table sqltest.table_add_column_test partition(dt='20230313')select org_col1,'added_col_0313', added_column from sqltest.table_add_column_test where dt ='20230313';insert overwrite table sqltest.table_add_column_test partition(dt='20230314')select org_col1,'added_col_0314', added_column from sqltest.table_add_column_test where dt ='20230314';

查看数据(旧分区数据有更新)

+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| table_add_column_test.org_col1  | table_add_column_test.added_column  | table_add_column_test.org_col2  | table_add_column_test.dt  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| org_col1_0313                   | added_col_0313                      | org_col2_0313                   | 20230313                  || org_col1_0314                   | added_col_0314                      | org_col2_0314                   | 20230314                  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+

不加 cascade 操作(针对已有分区数据,不包括新增分区)

  • 删除表
droptableifexists sqltest.table_add_column_test;
  • 创建测试表
createtableifnotexists sqltest.table_add_column_test(
  org_col1 string comment'原始数据1',
  org_col2 string comment'原始数据2')comment'增加分区表字段的测试表'
partitioned by(dt string comment'分区日期');
  • 插入测试数据
insertintotable sqltest.table_add_column_test partition(dt='20230313')values('org_col1_0313','org_col2_0313');insertintotable sqltest.table_add_column_test partition(dt='20230314')values('org_col1_0314','org_col2_0314');
  • 添加列(不加关键字 cascade)
altertable sqltest.table_add_column_test addcolumns(added_column string comment'新添加的列');altertable sqltest.table_add_column_test change column added_column added_column string after org_col1;

查看数据

+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| table_add_column_test.org_col1  | table_add_column_test.added_column  | table_add_column_test.org_col2  | table_add_column_test.dt  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| org_col1_0313                   | org_col2_0313                       | NULL                            | 20230313                  || org_col1_0314                   | org_col2_0314                       | NULL                            | 20230314                  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+
  • 重刷旧分区数据
insert overwrite table sqltest.table_add_column_test partition(dt='20230313')select org_col1,'added_col_0313', added_column from sqltest.table_add_column_test where dt ='20230313';insert overwrite table sqltest.table_add_column_test partition(dt='20230314')select org_col1,'added_col_0314', added_column from sqltest.table_add_column_test where dt ='20230314';
  • 查看数据(旧分区没有变化)
+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| table_add_column_test.org_col1  | table_add_column_test.added_column  | table_add_column_test.org_col2  | table_add_column_test.dt  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+| org_col1_0313                   | added_col_0313                      | NULL                            | 20230313                  || org_col1_0314                   | added_col_0314                      | NULL                            | 20230314                  |+---------------------------------+-------------------------------------+---------------------------------+---------------------------+--+

本文转载自: https://blog.csdn.net/qq_45124566/article/details/129533437
版权归原作者 冷漠; 所有, 如有侵权,请联系我们删除。

“Hive 分区表新增字段 cascade”的评论:

还没有评论