解决Hive视图View数据乱码的问题

1、场景描述

在Hive中，基于某个表创建视图，直接引用表的字段是不会有问题的；但如果增加一个不存在表中的字段，且字段值为中文，就会出现乱码的问题。

create table t_unicode_test as select '中国' as country;

create view v_unicode_test as select country, '中国' as country2 from t_unicode_test;

select * from test.v_unicode_test;

+-------------------------+--------------------------+
| v_unicode_test.country  | v_unicode_test.country2  |
+-------------------------+--------------------------+
| 中国                    | ??                       |
+-------------------------+--------------------------+

2、解决过程

    在网上找了一圈，原因是Hive的元数据库默认编码是 Latin1（ISO-8859-1），解决方案基本都是去Hive元数据库修改编码。

    等等，Excuse me???谁家的开发权限这么大，还可以改元数据库？没有权限访问Hive的元数据库，看来只能 “曲线救国” 了。

    思来想去，最后发现使用 Unicode码 就可以了。

3、解决方案

    问题来了，如何将字符串转换为Unicode码。我在网上抄了一段java代码：

public static String strToUnicode(String str) {
    char[] chars = str.toCharArray();
    StringBuilder returnStr = new StringBuilder();
    for (char aChar : chars) {
        returnStr.append("\\u").append(Integer.toString(aChar, 16));
    }
    return returnStr.toString();
}

public static void main(String[] args) {
    String str = "中国";

    System.out.println(strToUnicode(str));
}

结果：\u4e2d\u56fd

再修改一下创建视图的语句

alter view v_unicode_test as select country, '\u4e2d\u56fd' as country2 from t_unicode_test;

select * from test.v_unicode_test;

+-------------------------+--------------------------+
| v_unicode_test.country  | v_unicode_test.country2  |
+-------------------------+--------------------------+
| 中国                     | 中国                     |
+-------------------------+--------------------------+

完美解决

后话：

    如果本地没有java环境，可以借助一下Hive

select java_method('java.net.URLEncoder', 'encode', '中国', 'UTF-16BE');

+---------------+
|      _c0      |
+---------------+
| %4E%2D%56%FD  |
+---------------+

标签： hive 数据仓库 java

本文转载自: https://blog.csdn.net/a38123/article/details/126490579
版权归原作者 HoweSea 所有，如有侵权，请联系我们删除。

解决Hive视图View数据乱码的问题

1、场景描述

2、解决过程

3、解决方案

后话：

发表评论

“解决Hive视图View数据乱码的问题”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航