spark3使用hive zstd压缩格式总结

ZSTD（全称为Zstandard）是一种开源的无损数据压缩算法，其压缩性能和压缩比均优于当前Hadoop支持的其他压缩格式，本特性使得Hive支持ZSTD压缩格式的表。Hive支持基于ZSTD压缩的存储格式有常见的ORC，RCFile，TextFile，JsonFile，Parquet，Squence，CSV。

ZSTD压缩格式的建表方式如下：

ORC存储格式建表时可指定TBLPROPERTIES(“orc.compress”=“zstd”)：

createtable tab_1(...) stored as orc TBLPROPERTIES("orc.compress"="zstd");

Parquet存储格式建表可指定TBLPROPERTIES(“parquet.compression”=“zstd”)：

createtable tab_2(...) stored as parquet TBLPROPERTIES("parquet.compression"="zstd");

其他格式或通用格式建表可执行设置参数指定compress,codec为“org.apache.hadoop.io.compress.ZStandardCode”：

set hive.exec.compress.output=true;set mapreduce.map.output.compress=true;set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec;set mapreduce.output.fileoutputformat.compress=true;set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec;set hive.exec.compress.intermediate=true;createtable tab_3(...) stored as textfile;

说明：

ZSTD压缩格式的表和其他普通压缩表的SQL操作没有区别，可支持正常的增删查及聚合类SQL操作。

写出的文件使用zstd压缩，spark3才开始支持
–conf spark.sql.parquet.compression.codec=zstd

标签：数据仓库大数据 hive

本文转载自: https://blog.csdn.net/qq_36039236/article/details/133753270
版权归原作者 雾岛与鲸 所有，如有侵权，请联系我们删除。

spark3使用hive zstd压缩格式总结

发表评论

“spark3使用hive zstd压缩格式总结”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航