7. Hive解析JSON字符串、JSON数组

文章目录

Hive解析JSON字符串

1. get_json_object

语法：get_json_object(json_string, path)- json_string 是要解析的JSON字符串- path 是用于指定要提取的字段路径的字符串

-- 示例1(单层JSON)SELECT get_json_object('{
                            "name": "John",
                            "age": 30
                         }','$.name');-- res: "John"-- 示例2(嵌套JSON)SELECT get_json_object('{
                            "person": {
                                "name": "John",
                                "age": 30,
                                "address": {
                                    "street": "123 Main St",
                                    "city": "New York"
                                }
                            }
                         }','$.person.address.street');-- res: "123 Main St"

局限性

get_json_object 函数的性能会受到 JSON数据的结构和大小 的影响。对于较复杂的嵌套结构，考虑使用Hive的其他函数或自定义函数来处理JSON数据可能更合适。
get_json_object 函数每次只能返回一个数据项。

2. json_tuple

语法：json_tuple(json_string, field1, field2, ...)- json_string 是要解析的JSON字符串- field1、field2 ... 是要提取的字段名

-- 示例1(单层JSON)select json_tuple('{
                        "name": "zhangsan",
                        "age": 18
                    }','name','age');-- res: zhangsan    18-- 示例2(嵌套JSON)SELECT json_tuple('{
                        "person": {
                            "name": "Alice",
                            "age": 25,
                            "address": {
                                "city": "New York",
                                "country": "USA"
                            }
                        }
                    }','person.name','person.age','person.address.city');-- res: Alice    25    New York

Hive解析JSON数组

前置知识

explode函数

语法：explode(Array OR Map)
说明：explode() 函数接收一个 array 或者 map 类型的数据作为输入，然后将 array 或 map 里面的元素按照每行的形式输出，即将 hive 一列中复杂的 array 或者 map 结构拆分成多行显示

-- 解析 array
hive>select explode(array('A','B','C'));
OK
A
B
C

-- 解析map
hive>select explode(map('A',10,'B',20,'C',30));
OK
A       10
B       20
C       30

regexp_replace函数

语法： regexp_replace(string A, string B, string C)
说明：将字符串A中符合 java正则表达式B 的部分替换为C。注意，在有些情况下要使用转义字符，类似oracle中的regexp_replace函数。

hive>select regexp_replace('foobar','oo|ar',''); 
OK
fb

1. 嵌套子查询解析JSON数组（使用explode+regexp_replace）

数据如下所示：
array(json_str)[{“website”:“baidu.com”,“name”:“百度”},{“website”:“google.com”,“name”:“谷歌”}]
解析出其中的website、name
websitenamebaidu.com百度google.com谷歌

-- 思路-- 1. 使用 regexp_replace 函数将原数据转换为 {"website":"baidu.com","name":"百度"};{"website":"google.com","name":"谷歌"}-- 2. 使用 split 函数按照 ';' 分割 {"website":"baidu.com","name":"百度"};{"website":"google.com","name":"谷歌"},返回 [{"website":"baidu.com","name":"百度"},{"website":"google.com","name":"谷歌"}]-- 3. 使用 explode 炸裂为-- {"website":"baidu.com","name":"百度"}-- {"website":"google.com","name":"谷歌"}-- 4. 使用 json_tuple 解析数据-- 实现-- 1. 先将json数组中的元素解析出来，转化为每行显示SELECT explode(split(regexp_replace(regexp_replace('[{"website":"baidu.com","name":"百度"},{"website":"google.com","name":"谷歌"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\;\\{'),'\\;'));-- 2. 使用 json_tuple 解析数据select json_tuple(json,'website','name')from(select explode(split(regexp_replace(regexp_replace('[{"website":"baidu.com","name":"百度"},{"website":"google.com","name":"谷歌"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\;\\{'),'\\;'))as json) t1;

上面

regexp_replace

函数中的内容解析：

SELECT explode(split(
    regexp_replace(
        regexp_replace('[
                {"website":"baidu.com","name":"百度"},
                {"website":"google.com","name":"谷歌"}
            ]','\\[|\\]',''),--将json数组两边的中括号去掉'\\}\\,\\{','\\}\\;\\{'),--将json数组元素之间的逗号换成分号'\\;')--以分号作为分隔符(split函数以分号作为分隔));-- 问：为什么要将json数组中元素之间的逗号换成分号？-- 答：因为元素内的分隔也是逗号，如果不将元素之间的逗号换掉的话，后面用split函数分隔时也会把元素内的数据给分隔，这不是我们想要的结果。

2. 使用 lateral view 解析JSON数组

当我们数据的结构如下，
idsnames[1,2,3][{“name”: “daming”, “age”: “15”}, {“name”: “lingling”, “age”: “14”}, {“name”: “tom”, “age”: “17”}]
我们想要获取的数据为，
idname1daming2daming3daming1lingling2lingling3lingling1tom2tom3tom

with json_data as(select`array`(1,2,3)as ids,`array`('{"name": "daming", "age": "15"}','{"name": "lingling", "age": "14"}','{"name": "tom", "age": "17"}')as json_infos
)-- 使用json_tuple()报错: org.apache.hadoop.hive.ql.parse.SemanticException:-- UDTF's are not supported outside the SELECT clause, nor nested in expressions-- 原因: 未知-- SELECT id, json_tuple(json_info, 'name')SELECT id, get_json_object(json_info,'$.name')FROM json_data
         lateral view explode(json_data.ids) tmp_ids as id
         lateral view explode(json_data.json_infos) tmp_json_infos as json_info;

学习链接

文心一言

ChatGPT

Hive解析Json数组超全讲解

标签： hive json sql

本文转载自: https://blog.csdn.net/weixin_43676735/article/details/132549958
版权归原作者 helloooi 所有，如有侵权，请联系我们删除。

7. Hive解析JSON字符串、JSON数组

文章目录

Hive解析JSON字符串

1. get_json_object

局限性

2. json_tuple

Hive解析JSON数组

前置知识

explode函数

regexp_replace函数

1. 嵌套子查询解析JSON数组（使用explode+regexp_replace）

2. 使用 lateral view 解析JSON数组

学习链接

发表评论

“7. Hive解析JSON字符串、JSON数组”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航