一、累加
hive sql 实现同一列数据的累加,相信大家都会,这里就不过多解释了,贴一个例子结束:
with base_data as (
select 1 as fee, '20220101' as dt
union all select 2 as fee, '20220101' as dt
union all select 3 as fee, '20220102' as dt
union all select 4 as fee, '20220102' as dt
union all select 5 as fee, '20220103' as dt
union all select 6 as fee, '20220103' as dt
union all select 7 as fee, '20220104' as dt
union all select 8 as fee, '20220105' as dt
union all select 9 as fee, '20220105' as dt
)
select sum(fee) as fee from base_data
二、累乘
仔细一想,hive 好像没有直接对同一列求累乘的函数,这里需要用到高中的数学知识——对数,一起来回忆一下:
1. 对数定义
2.对数的性质
以上是对数的一些运算性质,其中我重点圈出了两个性质,这将是我们使用 hive sql 实现同一列数据累乘的关键;
1)左边的红框中,两个底数(a)相同的对数相加 = 以a为底(NM)的对数,其中(NM)就是我们想要的计算结果,应该如何获取(N*M)呢?
2)看右边红框的性质,我们可以利用这个性质获取(N*M)
映射到 hive ,可以将同一列的相乘转为同一列的对数相加,在求真数即可;
具体做法:
1)先将该列每一个值转为以10为底的对数(底数可随意),再对该列求sum,最后就得到以10为底【该列所有值相加】的结果为真数的对数。记为结果A(利用hive的log()函数)
2)再对A取真数(利用hive的power()函数)
用到的 log() 和 power() 两个函数不了解的可以自行百度一下
写个例子实战一下:
with base_data as (
select 1 as fee, '20220101' as dt
union all select 2 as fee, '20220101' as dt
union all select 3 as fee, '20220102' as dt
union all select 4 as fee, '20220102' as dt
union all select 5 as fee, '20220103' as dt
union all select 6 as fee, '20220103' as dt
union all select 7 as fee, '20220104' as dt
union all select 8 as fee, '20220105' as dt
union all select 9 as fee, '20220105' as dt
)
select sum(fee) as `单列累加`,
power(10, sum(log(10, fee))) as `单列累乘`
from base_data
结果:
可以发现,理论上来说 123*...*9 = 362880,但是为什么结果会是 362879.9999999994呢?
这是因为在累乘过程中,由于进行了log转换,存在较小精度损失;在真正使用时一般会用**round()**进行四舍五入处理;
再看:
with base_data as (
select 1 as fee, '20220101' as dt
union all select 2 as fee, '20220101' as dt
union all select 3 as fee, '20220102' as dt
union all select 4 as fee, '20220102' as dt
union all select 5 as fee, '20220103' as dt
union all select 6 as fee, '20220103' as dt
union all select 7 as fee, '20220104' as dt
union all select 8 as fee, '20220105' as dt
union all select 9 as fee, '20220105' as dt
)
select sum(fee) as `单列累加`,
power(10, sum(log(10, fee))) as `单列累乘`,
round(power(10, sum(log(10, fee)))) as `单列累乘-精度处理`
from base_data
结果:
至此,DONE
版权归原作者 TRX1024 所有, 如有侵权,请联系我们删除。