0


hive统计每五分钟交易量SQL

最近单位组织大数据考试,有一道SQL题,题目如下

已知:交易表(trade)

交易表结构如下:
trade_notrade_time100012022/8/29 09:30:37100022022/8/29 09:31:02100032022/8/29 09:32:51……
按照以下格式统计9:30以后每五分钟发生的交易量及交易占比:
时间段交易量占比[09:30:00 09:35:00)302.00%[09:35:00 09:40:00)151.00%[09:40:00 09:45:00)50.33%………
当时直接蒙圈了

现在经过多方查证总结了以下两种实现方案

1,通过函数 floor()、ceil()、date_format()函数分别获取交易时间字段对应的归属五分钟时间段

具体实现sql如下:

  1. select
  2. concat('[',a.minute_pgs,' ',a.minute_pge,')') minute_pg,
  3. count(1) cnt,
  4. concat(cast(cast(count(1)/18*100 as decimal(10,2)) as string),'%') as rate
  5. from (
  6. select
  7. t1.trade_no,
  8. t1.trade_time,
  9. concat_ws(':',cast(date_format(t1.trade_time,'HH') as string)
  10. ,case when cast((floor(date_format(t1.trade_time,'mm')/5 )*5) as string) = '0' then '00'
  11. when cast((floor(date_format(t1.trade_time,'mm')/5 )*5) as string) = '5' then '05'
  12. else cast((floor(date_format(t1.trade_time,'mm')/5 )*5) as string) end,'00') as minute_pgs,
  13. concat_ws(':',cast(date_format(t1.trade_time,'HH') as string)
  14. ,case when cast((CEIL(date_format(t1.trade_time,'mm')/5 )*5) as string) = '0' then '00'
  15. when cast((CEIL(date_format(t1.trade_time,'mm')/5 )*5) as string) = '5' then '05'
  16. else cast((CEIL(date_format(t1.trade_time,'mm')/5 )*5) as string) end,'00') as minute_pge
  17. from odss.trade t1
  18. where t1.trade_time >= '2022-08-25 09:30:00'
  19. ) a
  20. group by a.minute_pgs,a.minute_pge

方法二 通过lateral view函数配合explode及split 实现时间列表初始化

具体实现代码如下:

  1. with time_list as (
  2. select
  3. idx,
  4. from_unixtime(unix_timestamp(t.minNum)+300*idx) start_time,
  5. from_unixtime(unix_timestamp(t.minNum)+300*(idx+1)) end_time,
  6. SUBSTRING(from_unixtime(unix_timestamp(t.minNum)+300*idx),12,19) start_tm_str,
  7. SUBSTRING(from_unixtime(unix_timestamp(t.minNum)+300*(idx+1)),12,19) end_tm_str
  8. from(select DATE_FORMAT('2022-08-25 09:30:00','yyyy-MM-dd HH:mm:ss') as minNum,
  9. split(space(173-0),'') as x)t --09-30 到凌晨还有 1745分钟 idx起点=0
  10. lateral view posexplode(x) pe as idx,se --类似Oracle 递归操作
  11. ), --初始化时间列表
  12. tot_cnt as(
  13. select count(1) tcnt
  14. from odss.trade t1
  15. where t1.trade_time >= '2022-08-25 09:30:00'
  16. and t1.trade_time <= '2022-08-25 23:59:59'
  17. )--计算09:30:00后交易总量
  18. select CONCAT('[',l.start_tm_str,' ',l.end_tm_str,')') time_pg, --拼接时间段
  19. count(1) cnt,
  20. CONCAT(cast(cast(count(1)/c.tcnt * 100 as decimal(10,2)) as string),'%') rate --计算利率
  21. from time_list l inner join odss.trade t
  22. inner join tot_cnt c
  23. where t.trade_time >= l.start_time --hive on 不支持不等式操作 这里使用where条件
  24. and t.trade_time < l.end_time
  25. group by l.start_tm_str,l.end_tm_str,c.tcnt;
标签: 大数据 hive sql

本文转载自: https://blog.csdn.net/yulanlian/article/details/126586647
版权归原作者 陆小凤之凤舞九天 所有, 如有侵权,请联系我们删除。

“hive统计每五分钟交易量SQL”的评论:

还没有评论