hive alter table add columns 是否使用 cascade 的方案

结论

alter table xxx add columns

时加上 cascade 时，会把所有的分区都加上此字段。如果不加则只有新的分区会加上此字段，旧的分区没有此字段，即便数据文件里有对应的数据，也不能显示内容。

如果分区都是 insert overwrite 生成的，并且旧分区的数据不再重新生成，可以在 add columns 不用cascade，这样旧的分区对应的列显示 null。新的分区正常显示新增的列。
如果分区都是 insert overwrite 生成的，并且旧分区的数据需要重新生成。两种方案：1. 可以在 add columns 不用 cascade。然后每个分区先执行 drop partition，然后再执行 insert overwrite。2.可以在 add columns 使用 cascade，然后再执行 insert overwrite。如果方案2报错，则只能使用方案1.
如果文件是从外部生成，然后放到对应分区位置上，并且文件里已经有要加的字段数据。需要使用 cascade。如果使用 cascade 报错，那么看表是否是外部表。如果不是外部表，则先转成外部表。如果是外部表，则直接 drop partition 然后再 add partition location 增加分区。
如果文件是从外部生成，然后放到对应分区位置上，并且文件里没有要加的字段数据。不需要使用 cascade。旧的分区对应的列显示 null。新的分区正常显示新增的列。

注：

判断表是否是外部表，使用 ‘show create table xxx’, 如果生成的是 ‘CREATE TABLE’ 是内部表，如果是 CREATE EXTERNAL TABLE 是外部表。
把表从外部表转成内部表 ALTER TABLE xxx SET TBLPROPERTIES('EXTERNAL'='FALSE');
把表从内部表转成外部表 ALTER TABLE <table> SET TBLPROPERTIES('EXTERNAL'='TRUE');

准备文件
data.txt

key1,value1
key2,value2

create table t_no_cascade(c1 string) partitioned by (pt string) row format delimited
FIELDS TERMINATED BY ',' stored as textfile;

增加分区 pt=1

loaddatalocal inpath 'data.txt' overwrite intotable t_no_cascade partition(pt=1);

检索结果，显示 c1 和 pt 字段。

select*from t_no_cascade where pt=1;
OK
t_no_cascade.c1    t_no_cascade.pt
key1    1
key2    1

altertable t_no_cascade addcolumns(c2 string);

select*from t_no_cascade where pt=1;
OK
t_no_cascade.c1    t_no_cascade.c2    t_no_cascade.pt
key1    NULL1
key2    NULL1

loaddatalocal inpath 'data.txt' overwrite intotable t_no_cascade partition(pt=2);

select*from t_no_cascade where pt=2;
OK
t_no_cascade.c1    t_no_cascade.c2    t_no_cascade.pt
key1    value1    2
key2    value2    2

insert overwrite table t_no_cascade partition(pt=1)select c1,c2 from t_no_cascade where pt=2;

insert overwrite table xxx partition 还是使用之前的 partition id，所以此分区还是没有新的字段。

select*from t_no_cascade where pt=1;
OK
t_no_cascade.c1    t_no_cascade.c2    t_no_cascade.pt
key1    NULL1
key2    NULL1

altertable t_no_cascade droppartition(pt=1);insert overwrite table t_no_cascade partition(pt=1)select c1,c2 from t_no_cascade where pt=2;

这时的 partition(pt=1) 是新的分区id，这时可以看到新的数据。

select*from t_no_cascade where pt=1;
OK
t_no_cascade.c1    t_no_cascade.c2    t_no_cascade.pt
key1    value1    1
key2    value2    1

create table t_cascade(c1 string) partitioned by (pt string) row format delimited
FIELDS TERMINATED BY ',' stored as textfile;

增加分区 pt=1

loaddatalocal inpath 'data.txt' overwrite intotable t_cascade partition(pt=1);

检索结果，显示 c1 和 pt 字段。

select*from t_cascade where pt=1;
OK
t_cascade.c1    t_cascade.pt
key1    1
key2    1

altertable t_cascade addcolumns(c2 string)cascade;

select*from t_cascade where pt=1;
OK
t_cascade.c1    t_cascade.c2    t_cascade.pt
key1    value1    1
key2    value2    1

标签： hive

本文转载自: https://blog.csdn.net/houzhizhen/article/details/143719798
版权归原作者 houzhizhen 所有，如有侵权，请联系我们删除。