0


HIVE/SQL 实现同一列数据累加和累乘

一、累加

hive sql 实现同一列数据的累加,相信大家都会,这里就不过多解释了,贴一个例子结束:

with base_data as (
              select 1 as fee, '20220101' as dt 
    union all select 2 as fee, '20220101' as dt 
    union all select 3 as fee, '20220102' as dt 
    union all select 4 as fee, '20220102' as dt 
    union all select 5 as fee, '20220103' as dt 
    union all select 6 as fee, '20220103' as dt 
    union all select 7 as fee, '20220104' as dt 
    union all select 8 as fee, '20220105' as dt 
    union all select 9 as fee, '20220105' as dt 
)

select sum(fee) as fee from base_data

二、累乘

仔细一想,hive 好像没有直接对同一列求累乘的函数,这里需要用到高中的数学知识——对数,一起来回忆一下:

1. 对数定义

2.对数的性质

以上是对数的一些运算性质,其中我重点圈出了两个性质,这将是我们使用 hive sql 实现同一列数据累乘的关键;

1)左边的红框中,两个底数(a)相同的对数相加 = 以a为底(NM)的对数,其中(NM)就是我们想要的计算结果,应该如何获取(N*M)呢?

2)看右边红框的性质,我们可以利用这个性质获取(N*M)

映射到 hive ,可以将同一列的相乘转为同一列的对数相加,在求真数即可;

具体做法:

1)先将该列每一个值转为以10为底的对数(底数可随意),再对该列求sum,最后就得到以10为底【该列所有值相加】的结果为真数的对数。记为结果A(利用hive的log()函数)

2)再对A取真数(利用hive的power()函数)

用到的 log() 和 power() 两个函数不了解的可以自行百度一下

写个例子实战一下:

with base_data as (
              select 1 as fee, '20220101' as dt 
    union all select 2 as fee, '20220101' as dt 
    union all select 3 as fee, '20220102' as dt 
    union all select 4 as fee, '20220102' as dt 
    union all select 5 as fee, '20220103' as dt 
    union all select 6 as fee, '20220103' as dt 
    union all select 7 as fee, '20220104' as dt 
    union all select 8 as fee, '20220105' as dt 
    union all select 9 as fee, '20220105' as dt 
)

select     sum(fee)                         as `单列累加`,
        power(10, sum(log(10, fee)))     as `单列累乘`
from     base_data

结果:

可以发现,理论上来说 123*...*9 = 362880,但是为什么结果会是 362879.9999999994呢?

这是因为在累乘过程中,由于进行了log转换,存在较小精度损失;在真正使用时一般会用**round()**进行四舍五入处理;

再看:

with base_data as (
              select 1 as fee, '20220101' as dt 
    union all select 2 as fee, '20220101' as dt 
    union all select 3 as fee, '20220102' as dt 
    union all select 4 as fee, '20220102' as dt 
    union all select 5 as fee, '20220103' as dt 
    union all select 6 as fee, '20220103' as dt 
    union all select 7 as fee, '20220104' as dt 
    union all select 8 as fee, '20220105' as dt 
    union all select 9 as fee, '20220105' as dt 
)

select     sum(fee)                             as `单列累加`,
        power(10, sum(log(10, fee)))         as `单列累乘`,
        round(power(10, sum(log(10, fee))))    as `单列累乘-精度处理`
from     base_data

结果:

至此,DONE

标签: hive sql 大数据

本文转载自: https://blog.csdn.net/weixin_43161811/article/details/126859985
版权归原作者 TRX1024 所有, 如有侵权,请联系我们删除。

“HIVE/SQL 实现同一列数据累加和累乘”的评论:

还没有评论