0


Hive中的explode函数、posexplode函数与later view函数

1.概述

  在离线数仓处理通过HQL业务数据时,经常会遇到行转列或者列转行之类的操作,就像concat_ws之类的函数被广泛使用,今天这个也是经常要使用的拓展方法。

2.explode函数

2.1 函数语法

-- explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode
Functiontype:BUILTIN
-- explode()用于array的语法如下select explode(arraycol)as newcol from tablename;-- explode()用于map的语法如下:select explode(mapcol)as(keyname,valuename)from tablename;

2.2 函数说明

  • explode 函数是UDTF函数,将hive一列中复杂的array或者map结构拆分成多行。
  • Explode函数是不允许在select再有其他字段, - explode(ARRAY) 列表中的每个元素生成一行。- explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列。

2.3 使用案例

-- explode (array)select explode(array('A','B','C'));select explode(array('A','B','C'))as col;select tf.*from(select0) t lateral view explode(array('A','B','C')) tf;select tf.*from(select0) t lateral view explode(array('A','B','C')) tf as col;-- 结果
col
A
B
C
-- explode (map)select explode(map('A',10,'B',20,'C',30));select explode(map('A',10,'B',20,'C',30))as(key,value);select tf.*from(select0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;select tf.*from(select0) t lateral view explode(map('A',10,'B',20,'C',30)) tf askey,value;-- 结果keyvalue
A    10
B    20
C    30

3.posexplode函数

2.1 函数语法

-- posexplode(a) - behaves like explode for arrays, but includes the position of items in the original arrayFunction class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFPosExplode
Functiontype:BUILTIN
select posexplode(ARRAY<T> a)-- Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

2.2 函数说明

  • posexplode 函数,将ARRAY数组a展开,每个Value一行,每行两列分别对应数组从0开始的下标和数组元素。

2.3 使用案例

-- posexplode (array)select posexplode(array('A','B','C'));select posexplode(array('A','B','C'))as(pos,val);select tf.*from(select0) t lateral view posexplode(array('A','B','C')) tf;select tf.*from(select0) t lateral view posexplode(array('A','B','C')) tf as pos,val;-- 结果
pos val
0    A
1    B
2    C

4.later view

4.1 语法

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*-- columnAlias是给udtf(expression)列起的别名。-- tableAlias 虚拟表的别名。

4.2 用法描述

  • lateral view为侧视图,意义是为了配合UDTF来使用,把某一行数据拆分成多行数据。
  • 不加lateral view的UDTF只能提取单个字段拆分,并不能塞回原来数据表中。
  • 加上lateral view就可以将拆分的单个字段数据与原始表数据关联上。
  • lateral view函数会将UDTF生成的结果放到一个虚拟表中,然后虚拟表中的数据和输入行进行join来达到连接UDTF外的select字段的目的。(本质是笛卡尔积)

4.3 使用案例

4.3.1 准备数据

下表 pageAds. 它有两个字段:

pageid

(页码) and

adid_list

(页面上的adid):
Column nameColumn typepageidSTRINGadid_listArray
表中数据如下:
pageidadid_listfront_page[1, 2, 3]contact_page[3, 4, 5]

需求: 统计各个页面出现的广告的次数

4.3.2 代码实现

第一步: 使用 lateral view 和 explore() 函数将 adid_list 列的 list 拆分,sql代码如下:

select pageid, adid
FROM pageAds lateral view explode(adid_list) ad_view as adid;

可的如下结果
pageidadidfront_page1front_page2front_page3contact_page4contact_page5
第二步: 使用 count/group by 语句统计出每个adid出现的次数:

select adid,count(1)as cnt
FROM pageAds lateral view explode(adid_list) ad_view as adid
groupby adid;

adidcnt1121324151

4.4 Multiple Lateral Views

FROM子句可以有多个LATERAL VIEW子句。 后面的LATERAL VIEWS子句可以引用出现在LATERAL VIEWS左侧表的任何列。

例如,如下查询:

SELECT*FROM exampleTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;

例如使用以下基表:
Array pageid_listArray adid_list[1, 2, 3][“a”, “b”, “c”][3, 4][“c”, “d”]
单个Lateral View查询:

SELECT pageid_list, adid
FROM pageAds_1
         LATERAL VIEW explode(adid_list) adTable AS adid;[1,2,3]    a
[1,2,3]    b
[1,2,3]    c
[4,5]    c
[4,5]    d

多个Lateral View查询:

select pageid,adid FROM pageAds_1
lateral view explode(pageid_list) adTable as pageid
lateral view explode(adid_list) adTable as adid;1,a
1,b
1,c
2,a
2,b
2,c
3,a
3,b
3,c
3,c
3,d
4,c
4,d

4.5 later view json_tuple()

4.5.1 准备数据
createtable lateral_tal_3
(
    id   int,
    col1 string,
    col2 string
);insertinto lateral_tal_3 values(1234,'{"part1" : "61", "total" : "623", "part2" : "560", "part3" : "1", "part4" : "1"}','    {"to_part2" : "0", "to_part4" : "0", "to_up" : "0", "to_part3" : "0", "to_part34" : "0"}'),(4567,'{"part1" : "451", "total" : "89928", "part2" : "88653", "part3" : "789", "part4" : "35"}','{"to_part2" : "54", "to_part4" : "6", "to_up" : "65", "to_part3" : "2", "to_part34" : "3"}'),(7890,'{"part1" : "142", "total" : "351808", "part2" : "346778", "part3" : "4321", "part4" : "567"}','{"to_part2" : "76", "to_part4" : "23", "to_up" : "65", "to_part3" : "14", "to_part34" : "53"}');

idcol1col21234{“part1” : “61”, “total” : “623”, “part2” : “560”, “part3” : “1”, “part4” : “1”}{“to_part2” : “0”, “to_part4” : “0”, “to_up” : “0”, “to_part3” : “0”, “to_part34” : “0”}4567{“part1” : “451”, “total” : “89928”, “part2” : “88653”, “part3” : “789”, “part4” : “35”}{“to_part2” : “54”, “to_part4” : “6”, “to_up” : “65”, “to_part3” : “2”, “to_part34” : “3”}7890{“part1” : “142”, “total” : “351808”, “part2” : “346778”, “part3” : “4321”, “part4” : “567”}{“to_part2” : “76”, “to_part4” : “23”, “to_up” : “65”, “to_part3” : “14”, “to_part34” : “53”}

需求: 解析非结构化的json数据类型

“json_tuple(jsonStr, p1, p2, …, pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string.”
Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple
Function type:BUILTIN

json_tuple : 第一个参数是json 字符串所在的列名,其它参数是获取 json 字符串中的哪些key值;

4.5.2 代码实现
SELECT id,
       part1,
       part3,
       part4,
       to_part2,
       to_part3,
       to_part4,IF(part3 =0,0.0, to_part3 / part3)as ratio3,IF(part4 =0,0.0, to_part4 / part4)as ratio4
FROM lateral_tal_3
         lateral VIEW json_tuple(col1,'part3','part4','part1') json1 AS part3, part4, part1
         lateral VIEW json_tuple(col2,'to_part2','to_part3','to_part4') json2 AS to_part2, to_part3, to_part4
;1234,61,1,1,0,0,0,0,04567,451,789,35,54,2,6,0.0025348542458808617,0.171428571428571437890,142,4321,567,76,14,23,0.0032399907428835918,0.04056437389770723

5.使用案例

需求1: 如何产生1-100的连续的数字?

--方式1: 结合space函数与split函数,posexplode函数,lateral view函数获得select id_start + pos as id
from(select1as id_start,100as id_end
     ) m lateral view posexplode(split(space(id_end - id_start),'')) t as pos, val;-- 方式2:结合space函数与split函数,explode函数,lateral view函数+窗口函数获得select row_number()over()as id
from(select split(space(99),'')as x) t
         lateral view
             explode(x) ex;-- 方式2:结合space函数与split函数,posexplode函数,lateral view函数获取from(select split(space(99),' ')as x) t
         lateral view
             posexplode(x) ex as pos,val;

需求2: 获取2024-07-15至2024-07-29间所有的日期

SELECT pos,
       date_add(start_date, pos) dd
FROM(SELECT'2024-07-15'AS start_date,'2024-07-29'AS end_date)temp
         lateral VIEW
             posexplode(split(space(datediff(end_date, start_date)),'')) t
         AS pos, val;

本文转载自: https://blog.csdn.net/weixin_44852067/article/details/136751954
版权归原作者 独影月下酌酒 所有, 如有侵权,请联系我们删除。

“Hive中的explode函数、posexplode函数与later view函数”的评论:

还没有评论