0


【hive】行转列—explode()/posexplode()/lateral view 函数使用场景

文章目录


一、lateral view函数

  • 功能:- 用于和UDTF函数(explode,split)结合使用,把某一行数据拆分成多行数据,再将多行结果组合成一个支持别名的虚拟表。- 主要解决select使用UDTF做查询的过程中查询只能包含单个UDTF,不能包含其它字段以及多个UDTF的情况。
  • 语法:LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)- columnAlias是给udtf(expression)列起的别名。- tableAlias 虚拟表的别名。
  • lateral view函数会将lUDTF生成的结果放到一个虚拟表中,然后这个虚拟表和输入行进行join来达到连接UDTF外的select字段的目的。

使用方式:

举例1:与explode() 函数连用

--测试表的数据结构,可以看到第一列是id(string类型),第二列是rule_array(array<string>类型)> desc test_2;

--结果:
+-------------+----------------+----------+
|  col_name   |   data_type    | comment  |
+-------------+----------------+----------+
|id| string         ||| rule_array  | array<string>||
+-------------+----------------+----------+

--查看测试表的内容
>select * from test_2;

--结果:
+------------+----------------------------------------+
| test_2.id  |           test_2.rule_array            |
+------------+----------------------------------------+
| a          |["501","502","503","501","512"]|| b          |["511","512","513","511","512","511"]|| c          |["512","513","511","512"]|
+------------+----------------------------------------+

--拆分rule_array
>select id, rule_explode 
> from test_2
> lateral view explode(rule_array) adTable AS rule_explode;

--结果:
+-----+---------------+
|id| rule_explode  |
+-----+---------------+
| a   |501|| a   |502|| a   |503|| a   |501|| a   |512|| b   |511|| b   |512|| b   |513|| b   |511|| b   |512|| b   |511|| c   |512|| c   |513|| c   |511|| c   |512|
+-----+---------------+

举例2:与parse_url_tuple()函数连用

  • parse_url_tuple()函数 - 语法:parse_url(string urlString, string partToExtract [, string keyToExtract])- 功能:UDTF函数,解析URL字符串,它通过一次指定多个参数,从URL中解析出多个参数值返回多列。- 参数:partToExtract的有效值为:HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO。- 注意:在使用 select 时不能同时返回其他字段,不能嵌套调用,不能与group by放在一起调用等。为解决此问题,可以通过测试图(Lateral View)搭配调用。
--测试表的数据结构,可以看到第一列是id(int类型),第二列是url(string类型)> desc tb_url;

--结果:
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
|id| int        ||| url       | string     ||
+-----------+------------+----------+

--查看测试表的内容
>select * from tb_url;

--结果:
+------------+-------------------------------------------------+
| tb_url.id  |                   tb_url.url                    |
+------------+-------------------------------------------------+
|1| http://facebook.com/path/p1.php?query=1||2| http://www.baidu.com/news/index.jsp?uuid=frank  ||3| http://www.jd.com/index?source=baidu            |
+------------+-------------------------------------------------+

--提取tb_url的query部分
>select> a.id as id,
> c.query as query
> from tb_url a
> lateral view parse_url_tuple(url,"QUERY") c as query
>;

--结果:
+-----+---------------+
|id|     query     |
+-----+---------------+
|1|query=1||2|uuid=frank    ||3|source=baidu  |
+-----+---------------+

举例3:多个lateral view连用

--提取tb_url的HOST、PATH和QUERY部分
>select> a.id as id,
> b.host as host,
> b.path as path,
> c.query as query
> from tb_url a
> lateral view parse_url_tuple(url,"HOST","PATH") b as host,path
> lateral view parse_url_tuple(url,"QUERY") c as query
>;

--结果:
+-----+----------------+------------------+---------------+
|id|host|       path       |     query     |
+-----+----------------+------------------+---------------+
|1| facebook.com   | /path/p1.php     |query=1||2| www.baidu.com  | /news/index.jsp  |uuid=frank    ||3| www.jd.com     | /index           |source=baidu  |
+-----+----------------+------------------+---------------+

注意:

  • lateral view函数的位置在from后where条件前。
  • 生成虚拟表的别名不可省略。
  • from后可带多个 lateral view函数。
  • 如果要拆分的字段有null值,需要使用 lateral view outer替代,避免数据缺失。

二、explode()函数

  • 功能:UDTF函数,可以将一个array或者map展开
  • 语法:explode(map<string> ,array <string>) - explode(array):将array列表里的每个元素生成一行- explode(map):每一对元素作为一行,key为一列,value为一列

使用方式:

举例1:直接使用

--explode(array)>select explode(array(11,22,33)) as item;

--结果:
+-------+
| item  |
+-------+
|11||22||33|
+-------+

--explode(map)>select explode(map("id",11,"name","lily","age",18)) as (k,v);

--结果:
+-------+-------+
|   k   |v|
+-------+-------+
|id|11|| name  | lily  || age   |18|
+-------+-------+

举例2:与lateral view一起使用

  • 语法:lateral view explode(字段) 表别名 as 列别名
  • 如果字段类型是map:lateral view explode(字段) 表别名 as (key别名,value别名)
>select * from test_2;

--结果:
+------------+----------------------------------------+
| test_2.id  |           test_2.rule_array            |
+------------+----------------------------------------+
| a          |["501","502","503","501","512"]|| b          |["511","512","513","511","512","511"]|| c          |["512","513","511","512"]|
+------------+----------------------------------------+

>select id,rule
> from test_2 
> lateral view explode(rule_array) tbl as rule
>;

--结果:
+-----+-------+
|id| rule  |
+-----+-------+
| a   |501|| a   |502|| a   |503|| a   |501|| a   |512|| b   |511|| b   |512|| b   |513|| b   |511|| b   |512|| b   |511|| c   |512|| c   |513|| c   |511|| c   |512|
+-----+-------+

举例3:使用局限性

  • 不能关联原有的表中其它字段。例如:
>select id, explode(rule_array) as rule from test_2;

--报错:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
  • 不能与group by、cluster by 、distribute by、sort by联用。例如:
>select explode(rule_array) as rule from test_2 group by explode(rule_array);

--报错:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
  • 不能进行UDTF嵌套。例如:
>select explode(explode(rule_array)) from test_2 ;

--报错:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
  • 不允许选择其他表达式。例如:
>select explode("1,2,3,4,5") from test_2 ; 

--报错:
Error: Error while compiling statement: FAILED: UDFArgumentException explode() takes an array or a map as a parameter (state=42000,code=40000)

三、posexplode()函数

  • 功能:UDTF函数,将一个array或者map展开,可以将index(索引)和数据都取出来,使用两次posexplode()并用where语句使两次取到的index相等,从而实现对多列进行多行转换。explode()函数只能将对一列进行行转换。
  • 语法:posexplode(map<string> ,array <string>)

使用方式:

举例1:

> desc test_2;

--结果:
+-------------+----------------+----------+
|  col_name   |   data_type    | comment  |
+-------------+----------------+----------+
|id| string         ||| dt_array    | array<string>||| rule_array  | array<string>||
+-------------+----------------+----------+

>select * from test_2;

--结果:
+------------+----------------------------------------------------+----------------------------------------+
| test_2.id  |                  test_2.dt_array                   |           test_2.rule_array            |
+------------+----------------------------------------------------+----------------------------------------+
| a          |["20230809","20230811","20230812","20230812","20230813"]|["501","502","503","501","512"]|| b          |["20230809","20230811","20230812","20230812","20230813","20230809"]|["511","512","513","511","512","511"]|| c          |["20230811","20230812","20230812","20230813"]|["512","513","511","512"]|
+------------+----------------------------------------------------+----------------------------------------+

--注:语句中b.idxb和c.idxc分别是子视图的两个索引,我们where子句中使用b.idxb=c.idxc保证顺序的一致性。
>select a.id,b.cur_day,c.rule 
> from test_2 a
> lateral view posexplode(dt_array) b as idxb,cur_day,
> lateral view posexplode(rule_array) c as idxc,rule where b.idxb=c.idxc
>;

--结果:
+-------+------------+---------+
| a.id  | b.cur_day  | c.rule  |
+-------+------------+---------+
| a     |20230809|501|| a     |20230811|502|| a     |20230812|503|| a     |20230812|501|| a     |20230813|512|| b     |20230809|511|| b     |20230811|512|| b     |20230812|513|| b     |20230812|511|| b     |20230813|512|| b     |20230809|511|| c     |20230811|512|| c     |20230812|513|| c     |20230812|511|| c     |20230813|512|
+-------+------------+---------+

四、行转列使用

单列转多行

举例1:使用explode()+lateral view

>select t.id,a.cur_day
> from test_2 t
> lateral view explode(t.dt_array) a as cur_day
>;

--结果:
+-------+------------+
| t.id  | a.cur_day  |
+-------+------------+
| b     |20230809|| b     |20230811|| b     |20230812|| b     |20230812|| b     |20230813|| b     |20230809|| c     |20230811|| c     |20230812|| c     |20230812|| c     |20230813|| a     |20230809|| a     |20230811|| a     |20230812|| a     |20230812|| a     |20230813|
+-------+------------+

举例2:explode字段是string格式,先split()指定分隔符,如果省略split()则使用默认分隔符是英文逗号。

> desc test_3;

--结果:
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
|id| string     ||| dt        | string     ||| rule1     | string     ||
+-----------+------------+----------+

>select * from test_3;

--结果:
+------------+----------------------------------------------------+--------------------------+
| test_3.id  |                     test_3.dt                      |       test_3.rule1       |
+------------+----------------------------------------------------+--------------------------+
| a          |20230809,20230811,20230812,20230812,20230813       |501,502,503,501,512      || b          |20230809,20230811,20230812,20230812,20230813,20230809 |511,512,513,511,512,511  || c          |20230811,20230812,20230812,20230813                |512,513,511,512          |
+------------+----------------------------------------------------+--------------------------+

>select t.id,a.cur_day
> from test_3 t
> lateral view explode(split(t.dt,',')) a as cur_day
>;

--结果:
+-------+------------+
| t.id  | a.cur_day  |
+-------+------------+
| a     |20230809|| a     |20230811|| a     |20230812|| a     |20230812|| a     |20230813|| b     |20230809|| b     |20230811|| b     |20230812|| b     |20230812|| b     |20230813|| b     |20230809|| c     |20230811|| c     |20230812|| c     |20230812|| c     |20230813|
+-------+------------+

多列转多行

举例1:使用posexplode()+lateral view

先测试使用explode(), 看看效果:

>select  a.id,b.cur_day,c.rule 
> from test_2 a
> lateral view explode(dt_array) b as cur_day,
> lateral view explode(rule_array) c as rule
>;

--结果:
+-------+------------+---------+
| a.id  | b.cur_day  | c.rule  |
+-------+------------+---------+
| b     |20230809|511|| b     |20230809|512|| b     |20230809|513|| b     |20230809|511|| b     |20230809|512|| b     |20230809|511|| b     |20230811|511|| b     |20230811|512|| b     |20230811|513|| b     |20230811|511|| b     |20230811|512|| b     |20230811|511|| b     |20230812|511|| b     |20230812|512|| b     |20230812|513|| b     |20230812|511|| b     |20230812|512|| b     |20230812|511|| b     |20230812|511|| b     |20230812|512|| b     |20230812|513|| b     |20230812|511|| b     |20230812|512|| b     |20230812|511|| b     |20230813|511|| b     |20230813|512|| b     |20230813|513|| b     |20230813|511|| b     |20230813|512|| b     |20230813|511|| b     |20230809|511|| b     |20230809|512|| b     |20230809|513|| b     |20230809|511|| b     |20230809|512|| b     |20230809|511|| c     |20230811|512|| c     |20230811|513|| c     |20230811|511|| c     |20230811|512|| c     |20230812|512|| c     |20230812|513|| c     |20230812|511|| c     |20230812|512|| c     |20230812|512|| c     |20230812|513|| c     |20230812|511|| c     |20230812|512|| c     |20230813|512|| c     |20230813|513|| c     |20230813|511|| c     |20230813|512|| a     |20230809|501|| a     |20230809|502|| a     |20230809|503|| a     |20230809|501|| a     |20230809|512|| a     |20230811|501|| a     |20230811|502|| a     |20230811|503|| a     |20230811|501|| a     |20230811|512|| a     |20230812|501|| a     |20230812|502|| a     |20230812|503|| a     |20230812|501|| a     |20230812|512|| a     |20230812|501|| a     |20230812|502|| a     |20230812|503|| a     |20230812|501|| a     |20230812|512|| a     |20230813|501|| a     |20230813|502|| a     |20230813|503|| a     |20230813|501|| a     |20230813|512|
+-------+------------+---------+

出现这种情况,是因为两个并列的explode()的hql没办法识别cur_day对应的rule是什么,对于多个数组的行转列可以使用posexplode()函数。
例如使用如下查询语句:

>select a.id,b.cur_day,c.rule 
> from test_2 a
> lateral view posexplode(dt_array) b as idxb,cur_day,
> lateral view posexplode(rule_array) c as idxc,rule where b.idxb=c.idxc
>;

--结果:
+-------+------------+---------+
| a.id  | b.cur_day  | c.rule  |
+-------+------------+---------+
| b     |20230809|511|| b     |20230811|512|| b     |20230812|513|| b     |20230812|511|| b     |20230813|512|| b     |20230809|511|| c     |20230811|512|| c     |20230812|513|| c     |20230812|511|| c     |20230813|512|| a     |20230809|501|| a     |20230811|502|| a     |20230812|503|| a     |20230812|501|| a     |20230813|512|
+-------+------------+---------+

本文转载自: https://blog.csdn.net/sodaloveer/article/details/132837340
版权归原作者 sodaloveer 所有, 如有侵权,请联系我们删除。

“【hive】行转列—explode()/posexplode()/lateral view 函数使用场景”的评论:

还没有评论