overfit同步小助手

2023-08-12 03:04:35

Hive Sql优化之一次from查询多次insert into操作

Hive Sql优化

一次from查询多次insert into操作

例：统计字段空值率

优化点：一次map多个reduce，有效节省了map操作
流程如下：
1.创建表；
2.插入数据；
3.参照下面语句；

--创建student表
CREATE EXTERNAL TABLE IF NOT EXISTS STUDENT(
    s_no string comment '学号',
    s_name string comment '姓名',
    s_birth string comment '生日',
    s_age bigint comment '年龄',
    s_sex string comment '性别',
)

--创建统计空置率表
CREATE EXTERNAL TABLE IF NOT EXISTS STUDENT_COUNT(
ID STRING COMMENT  '字段名称',
COUNT STRING COMMENT  '数据累加'
NULL_RATE  DOUBLE  '空值率'
)

--清空表数据
truncate table student_count;
--插入数据
--年龄大于16学生的姓名和生日的空置率；
from (select * from student where s_age > 16) a
insert into student_count select 's_name ' id ,count(1) count,count(s_name )/count(1) as null_rate
insert into student_count select 's_birth ' id ,count(1) count,count(s_birth )/count(1) as null_rate;

使用grouping sets代替union的SQL优化

--grouping sets等操作时，用union关键词来构建多维统计的方式
--改写前的代码段
select * from(
select s_age,s_sex,count(1) num
from student_tb_orc
group by s_age,s_sex
union all
select s_age,null s_sex,count(1) num
from student_tb_orc
group by s_age
) a
--改写后的代码段
select s_age,s_sex,count(1) num
from student_tb_orc
group by s_age,s_sex
grouping sets((s_age),(s_age,s_sex))

标签： hive sql 大数据

本文转载自: https://blog.csdn.net/Avarice912/article/details/130327139
版权归原作者 Avarice912 所有，如有侵权，请联系我们删除。

发表评论

登录后发布评论

“Hive Sql优化之一次from查询多次insert into操作”的评论:

还没有评论

关于作者

overfit同步小助手

文章同步

相关阅读

网络安全法-网络运行安全

使用selenium/drissionpage时如何阻止chrome自动跳转http到https

docker desktop 里部署的Open WebUI 管理员密码忘记了的处理方法

在ubuntu20.04中搭建onsite比赛运行环境

利用开源的低代码表单设计器FcDesigner高效管理和渲染复杂表单结构

Kafka学习笔记

【前端】浏览器输入url到页面呈现发生了什么？

文章导航