0


oracle和hive之间关于sql的语法差异及转换

前言

最近工作需要从Oracle数据库SQL、存储过程迁移到hive上,两个之间还是有些差异的,特写此文,共大家参考学习。有问题欢迎指正。

1、oracle中的(+)写法

1.1、区分左右连接

左连接:

  1. select a.*,b.*
  2. from table1 a, table2 b
  3. where a.id = b.id(+)

右连接:

  1. select a.*,b.*
  2. from table1 a, table2 b
  3. where a.id(+) = b.id

1.2、hive中的写法

以左连接进行讲解,右连接同理

建表和插入数据

  1. CREATE TABLE Test_Departments
  2. (
  3. depID NUMBER(38,0),
  4. depName VARCHAR2(20),
  5. delFlag NUMBER(1,0)
  6. );
  7. --生成员工表
  8. CREATE TABLE Test_Employees
  9. (
  10. empID NUMBER(38,0),
  11. empName VARCHAR2(20),
  12. depID NUMBER(38,0),
  13. delFlag NUMBER(1,0)
  14. );
  15. --插入测试数据
  16. INSERT INTO Test_Departments VALUES(1,'FI',0);
  17. INSERT INTO Test_Departments VALUES(2,'MA',0);
  18. INSERT INTO Test_Departments VALUES(3,'HR',1);
  19. INSERT INTO Test_Departments VALUES(4,'IT',0);
  20. INSERT INTO Test_Employees VALUES(1,'wbq',1,0);
  21. INSERT INTO Test_Employees VALUES(2,'czh',2,0);
  22. INSERT INTO Test_Employees VALUES(3,'chh',1,0);
  23. INSERT INTO Test_Employees VALUES(4,'wal',2,0);
  24. INSERT INTO Test_Employees VALUES(5,'ddd',3,0);
a、最常用
  1. --Oracle中的写法
  2. SELECT A.*, B.*
  3. FROM Test_Departments A, Test_Employees B
  4. WHERE A.depID = B.depID(+);
  5. --Hive中的写法
  6. SELECT A.*, B.*
  7. FROM Test_Departments A
  8. LEFT JOIN Test_Employees B
  9. ON A.depID = B.depID;
b、副表带条件
  1. --Oracle中的写法:
  2. SELECT A.*, B.*
  3. FROM Test_Departments A, Test_Employees B
  4. WHERE A.depID = B.depID(+)
  5. AND (B.depID(+) = 3);
  6. --Hive中的写法:
  7. SELECT A.*, B.*
  8. FROM Test_Departments A
  9. LEFT JOIN Test_Employees B ON A.depID = B.depID
  10. AND B.depID = 3;

这里涉及到一个 left join 后面的过滤条件是on和where 的问题

select A.,B. from A left join B on A.id=B.id and B.id=3

返回结果:显示A表所有数据,B.id=3的数据,B.id不等于3的默认为空

select A.,B. from A,B where A.id=B.id(+) and B.id=3

返回结果:仅仅显示B.id=3的一条数据

c、只显示过滤条件的数据
  1. --Oracle
  2. SELECT A.*, B.*
  3. FROM Test_Departments A, Test_Employees B
  4. WHERE A.depID = B.depID(+)
  5. AND (B.depID = 3);
  6. --Hive
  7. SELECT A.*, B.*
  8. FROM Test_Departments A
  9. LEFT JOIN Test_Employees B ON A.depID = B.depID
  10. WHERE B.depID = 3;

这种方式也可以选择使用join实现

以上这段参考:

oracle (+)学习-CSDN博客

2、select中含有子查询

  1. --Oracle
  2. select a.id, (select b.id from b where b.name=a.id) from a
  3. --hive 是不支持select 里面子查询 修改如下:
  4. select a.id ,b.id from a left join b on a.id=b.name

3、oracle的decode函数

  1. --Oracle
  2. decode('key',if1,then1 ,if2,then2...thenN)
  3. --Hive
  4. --1、一般来改为:
  5. case when key = if1 then then1
  6. when key = if2 then then2
  7. ...
  8. else thenN end
  9. --2、如果decode比较简单 可以直接改为 :
  10. if('key'=if1,then1,then2)
  11. --复杂的改为:case when
  12. 注意hive有个decode函数是编码函数,不是用来处理null值的

4、oracle的时间转化

某字符串yyyyMM获取上个月时间

  1. --oracle
  2. select to_char(add_months(to_date('202202','yyyymm'),-1),'yyyymm')
  3. from dual;
  4. SELECT 'yyyyMM', to_char (SYSDATE,'yyyyMM') FROM dual UNION ALL
  5. SELECT 'yyyy-MM', to_char (SYSDATE,'yyyy-MM') FROM dual UNION ALL
  6. SELECT 'yyyy-MM-dd', to_char (SYSDATE,'yyyy-MM-dd') FROM dual UNION ALL
  7. SELECT 'yyyy/MM/dd', to_char (SYSDATE,'yyyy/MM/dd') FROM dual;
  8. TO_CHAR(TO_DATE(REPLACE(ADJ.VAR1, '-',''), 'YYYYMMDD'), 'yyyy')
  9. --hive
  10. select DATE_FORMAT(current_timestamp,'yyyy-MM-dd'); --转换为字符串格式
  11. select to_date(current_timestamp); --日期格式
  12. select 'yyyyMM', DATE_FORMAT(current_timestamp,'yyyyMM') union all
  13. select 'yyyy-MM', DATE_FORMAT(current_timestamp,'yyyy-MM') union all
  14. select 'yyyy-MM-dd', DATE_FORMAT(current_timestamp,'yyyy-MM-dd') union all
  15. select 'yyyy/MM/dd', DATE_FORMAT(current_timestamp, 'yyyy/MM/dd');
  16. with tmp as (
  17. select '2023-12-23' VAR1
  18. )
  19. select from_unixtime(unix_timestamp(replace(ADJ.VAR1, '-',''),'yyyyMMdd'),'yyyy')
  20. from tmp adj;

5、oracle的trunc函数

oracle的trunc函数改为hive的函数_hive对应oracle的trunc函数-CSDN博客

  1. --hive
  2. select TRUNC(current_date,'YYYY') year,
  3. TRUNC(current_date,'MM') month,
  4. last_day(current_date);
  5. --季初:
  6. select 'floor_quarter',date_format(floor_quarter(timestamp('2024-09-23')),'yyyy-MM-dd HH:mm:ss');
  7. --当前时间属于哪个季度:
  8. select quarter(current_date);

6、oracle instr函数

Oracle中的instr()函数 详解及应用_oracle instr-CSDN博客

  1. --oracle
  2. SELECT instr('1234567890123456789','3') FROM dual -- 3
  3. SELECT instr('1234567890123456789','3',1) FROM dual -- 3 ,从第1位开始查找第一个3
  4. SELECT instr('1234567890123456789','3',1,2) FROM dual --13 从第1位开始查找第二个3
  5. SELECT instr('1234567890123456789','3',4) FROM dual -- 13 从第4位开始查找第一个3
  6. SELECT instr('1234567890123456789','3',4,1) FROM dual --13 从第4位开始查找第一个3
  7. SELECT instr('1234567890123456789','3',4,2) FROM dual --0 从第4位开始查找第二个3
  8. --select instr('被查找的字符串','我们需要查找的字符',从第几位开始 首位是0,查找第几个出现的)
  9. --hive
  10. --instr(str, substr) Returns the index of the first occurance of substr in str
  11. SELECT instr('1234567890123456789','3'); -- 3
  12. --locate函数
  13. select locate('3','12345123',4) --8
  14. select locate('3','12345123',1) --3
  15. -- 这个locate函数也是找到字符串的下标 locate('要找的字符','被找的字符串',' 从下标多少开始找')。

7、截取

字段格式是: xx1.xxxxx2.xx3.xxx4.xx5.xx6 我们需要xx3格式的数据

缺省.受限制现金-人民币-风险准备金专户.工行高新支行321413RMB(财付通专用).缺省.缺省.缺省.缺省.缺省 -> 工行高新支行321413RMB(财付通专用)

  1. --Oracle
  2. SELECT
  3. SEGMENT_NAME_MERGE,
  4. SUBSTR(T.SEGMENT_NAME_MERGE,
  5. INSTR(T.SEGMENT_NAME_MERGE, '.', 1, 2) + 1,
  6. (INSTR(T.SEGMENT_NAME_MERGE, '.', 1, 3) - INSTR(T.SEGMENT_NAME_MERGE, '.', 1, 2)) - 1
  7. )
  8. FROM ODSERPDATA.ODS_CE_GL_ACCOUNT_Q T
  9. --hive
  10. with tmp as(
  11. select '缺省.受限制现金-人民币-风险准备金专户.工行高新支行321413RMB(财付通专用).缺省.缺省.缺省.缺省.缺省' text
  12. )
  13. select SUBSTRING_INDEX(substring_index(a.text,'.',3),'.',-1)
  14. from tmp a;
  15. select replace(substring_index(a,'.',3),substring_index(a,'.',2)||'.',''),
  16. substr(a,length(substring_index(a,'.',2))+2,length(substring_index(a,'.',3))-length(substring_index(a,'.',2))-1),
  17. regexp_extract(a,'.*?\\..*?\\.(.*?)\\.+',1)
  18. from (select '缺省.受限制现金-人民币-风险准备金专户.工行高新支行321413RMB(财付通专用).缺省.缺省.缺省.缺省.缺省' a )t
  19. --上面是三种办法。 1.是替换 2.是截取 3是正则

7、临时表名

  1. --Oracle
  2. SELECT * FROM (SELECT 1,2 FROM dual );
  3. --hive:必须临时表名
  4. select * from (select 1, 2 )t; --正确
  5. select * from (select 1, 2 ); --错误

8、with插入用法

  1. --Oracle
  2. INSERT INTO TEST.CC_STUDENT_02
  3. WITH tmp AS (SELECT * FROM TEST.CC_STUDENT_02 cs )
  4. SELECT * FROM tmp;
  5. --Hive
  6. WITH tmp AS (SELECT * FROM TEST.CC_STUDENT_02 cs )
  7. INSERT INTO TEST.CC_STUDENT_02
  8. SELECT * FROM tmp;

9、计算语法 或者||用法不同

  1. --Oracle
  2. SELECT substr('202212', 1, 4) - 1 || 'aa' FROM dual -- 2021aa
  3. SELECT 1||NULL||2 FROM dual -- 12
  4. --Hive
  5. SELECT substr('202212', 1, 4) - 1 || 'aa' -- 2021.0aa
  6. --解决办法:
  7. SELECT cast(substr('202212', 1, 4) - 1 as int)|| 'aa' -- 2021aa
  8. SELECT cast(substr('202212', 1, 4) as int) - 1|| 'aa' -- 2021aa
  9. --因为int-int=int string-int int-string=double

10、日期格式不标准的转换

  1. --Oracle
  2. SELECT to_date('2017-3-31', 'yyyy/mm/dd') FROM dual UNION ALL
  3. SELECT to_date('2018/11/6', 'yyyy/mm/dd') FROM dual UNION ALL
  4. SELECT to_date('2017/6/20', 'yyyy/mm/dd') FROM dual UNION ALL
  5. SELECT to_date('2017-06-20', 'yyyy/mm/dd') FROM dual UNION ALL
  6. SELECT to_date('20170620', 'yyyy/mm/dd') FROM dual
  7. --Hive
  8. with tmp as
  9. (SELECT '2017-3-31' var UNION ALL
  10. SELECT '2018/11/6' UNION ALL
  11. SELECT '20170620'
  12. )
  13. select case when length(a.var) = 8 then date(from_unixtime(unix_timestamp(a.var,'yyyyMMdd'),'yyyy-MM-dd'))
  14. else date(replace(a.var,'/','-')) end
  15. from tmp a;

11、WM_CONCAT

  1. --Oracle
  2. SELECT t.id ,to_char(WM_CONCAT(name)),WM_CONCAT(name)
  3. FROM (
  4. SELECT 1 AS id ,1 as name FROM dual UNION ALL
  5. SELECT 1 AS id ,2 as name FROM dual UNION ALL
  6. SELECT 1 AS id ,1 as name FROM dual UNION ALL
  7. SELECT 2 AS id ,4 as name FROM dual UNION ALL
  8. SELECT 2 AS id ,5 as name FROM dual
  9. )t
  10. GROUP BY t.id;
  11. --Hive
  12. with tmp as (
  13. SELECT 1 AS id ,'1' as name UNION ALL
  14. SELECT 1 AS id ,'2' as name UNION ALL
  15. SELECT 1 AS id ,'1' as name UNION ALL
  16. SELECT 2 AS id ,'4' as name UNION ALL
  17. SELECT 2 AS id ,'5' as name
  18. )
  19. select id
  20. ,concat_ws(',',collect_list(name)) --未去重
  21. ,concat_ws(',',collect_set(name)) --去重
  22. from tmp a
  23. group by id;

12、日期格式化

  1. --Oracle
  2. SELECT to_date('2023-01-02 15:55:03', 'yyyy-mm-dd hh24:mi:ss')
  3. ,TO_DATE('2023-01-02 15:55:03', 'YYYY-MM-DD HH24:MI:SS')FROM dual
  4. --Hive
  5. select DATE_FORMAT('2023-01-02 15:55:03','yyyy-MM-dd HH:mm:ss')

13、类型兼容问题

  1. --Oracle
  2. with t as (
  3. select '1' a union all
  4. select '11'a union all
  5. select '2' a union all
  6. select '3' a
  7. )
  8. select t.a from t order by a;
  9. --Hive
  10. with t as (
  11. select '1' a union all
  12. select '11'a union all
  13. select '2' a union all
  14. select '3' a
  15. )
  16. select t.a from t order by cast(a as int );

14、Oracle中pivot/和unpivot函数

oracle 学习之 unpivot/pivot函数及hive实现该功能_cclovezbf的博客-CSDN博客

15、RPAD LPAD (左右填充函数)

  1. --Oracle
  2. SELECT RPAD ('1234',10),LENGTH(RPAD ('1234',10)) FROM dual
  3. -- 1234 10 --注意 1234后面又6个空格
  4. SELECT RPAD ('1234',10,'a'),LENGTH(RPAD ('1234',10,'a')) FROM dual
  5. --1234aaaaaa 10
  6. --hive
  7. SELECT RPAD ('1234',10,' '),LENGTH(RPAD ('1234',10,' ')),RPAD ('1234',10,'a'),lPAD ('1234',10,'a')

16、时间相减

  1. --Oracle
  2. SELECT SYSDATE -1 FROM dual --获取昨天
  3. SELECT to_date('2023-05-03','YYYY-MM-DD') -to_date('2023-05-02','YYYY-MM-DD') FROM dual --1 两天时间差
  4. SELECT SYSDATE -to_date('2023-05-02','YYYY-MM-DD') FROM dual
  5. --9.64637731481481481481481481481481481481 时间差 计算了时分秒
  6. ---hive
  7. select `current_timestamp`(), `current_timestamp`() -to_date('2023-05-10')
  8. -- 2023-05-11 15:33:00.779000000,1 15:33:00.779000000 但是不推荐 因为看起来既直观又不直观
  9. --计算时间差一般采用的是datediff函数
  10. select datediff(current_timestamp(),to_date('2023-05-10')) -- 1
  11. 计--算前几天后几天采用date_add date_sub
  12. select current_date,date_add(current_date,1),date_sub(current_date(),1)
  13. ---相差的秒数
  14. select (unix_timestamp('2024-10-23 14:12:34') - unix_timestamp('2024-10-23 12:12:34'))*1.00000
  15. ---相差的月数
  16. SELECT months_between('2022-12-01', '2022-01-01') AS months_difference
  17. ---相差小时
  18. select (unix_timestamp('2024-10-23 14:13:34') - unix_timestamp('2024-10-23 12:12:34'))*1.00000 / 3600
  19. ---相差天数
  20. select (unix_timestamp('2024-10-24 14:13:34') - unix_timestamp('2024-10-23 12:12:34'))*1.00000 / 86400

以上参考文章:oracle和hive之间关于sql的语法差异及转换_hivesql 与oracle 语句区别-CSDN博客

17、INSERT语句

  1. ------ORACLE
  2. EXECUTE IMMEDIATE 'TRUNCATE TABLE TABLE_A';
  3. INSERT /*+APPEND*/ INTO A NOLOGGING (
  4. COL_1,
  5. COL_2,
  6. COL_3,
  7. );
  8. ------Hive:
  9. INSERT OVERWRITE TABLE TABLE_A;

18、HIVE特性

在迁移的时候,会遇到一些HIVE的特性导致无法顺利迁移,如下ORACLE语句:

  1. SELECT *
  2. FROM A
  3. GROUP BY A.COL_1
  4. HAVING COUNT(DISTINCT A.COL_2) = 2;

在HIVE中运行会遇到如下报错:

  1. FAILED: SemanticException [Error 10002]: Line 19:22 Invalid column reference 'COL_2'

去掉DISTINCT后可以正常执行。

属于HIVE的特性导致的问题,HAVING子句中无法使用DISTINCT关键字,在迁移中需要进行取舍。

标签: oracle hive sql

本文转载自: https://blog.csdn.net/weixin_42771366/article/details/142997828
版权归原作者 三生暮雨渡瀟瀟 所有, 如有侵权,请联系我们删除。

“oracle和hive之间关于sql的语法差异及转换”的评论:

还没有评论