hive语法

DDL

1.数据库操作

（1）创建库

create database db_hive1;

（2）显示库

show databases;

show database like 'db_hive*';(使用通配表达式)

#db_hive1 db_hive2

（3）.查看数据库信息

desc database db_hive1;

（4）.删除数据库

drop database db_hive1;

（5）.切换到当前数据库

use db_hive1;

2.创建表(内部表，外部表)

（1）内部表

Hive会完全管理表的元数据和数据文件。

create table if not exists student(
    id int, 
    name string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/student';

vim /opt/module/datas/student.txt

1001    student1
1002    student2
1003    student3
1004    student4
1005    student5
1006    student6
1007    student7
1008    student8
1009    student9
1010    student10
1011    student11
1012    student12
1013    student13
1014    student14
1015    student15
1016    student16

上传到hdfs

hadoop fs -put student.txt /user/hive/warehouse/student

查看文件：

删除之后再查看：

drop table student;

hdfs上也没有了

（2）外部表

Hive只负责管理元数据，不负责管理HDFS中的数据文件

create external table if not exists student(
    id int, 
    name string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/student';

删除之后hdfs上的数据还在

（3）SERDE和复杂数据类型

create table teacher
(
    name     string,
    friends  array<string>,
    students map<string,int>,
    address  struct<city:string,street:string,postal_code:int>
)
row format serde 'org.apache.hadoop.hive.serde2.JsonSerDe'
location '/user/hive/warehouse/teacher';

{
    "name": "dasongsong",
    "friends": [
        "bingbing",
        "lili"
    ],
    "students": {
        "xiaohaihai": 18,
        "xiaoyangyang": 16
    },
    "address": {
        "street": "hui long guan",
        "city": "beijing",
        "postal_code": 10010
    }
}

查询三种不同的数据类型的方法：

array：直接使用arr[i]索引即可

map:map['key']

struct:struct.id

(4)其他方式

3.修改表

1）重命名表

alter table stu rename to stu1;

2****）修改列信息

只能修改元数据，不修改hdfs上的数据文件

增加列

alter table stu add columns(age int);

更新列

alter table stu change column age ages double;

替换列

alter table stu replace columns(id int, name string);

4.删除表

drop table stu;

5.清空表

truncate只能清空管理表，不能删除外部表中数据。

truncate table student;

6.DML

（1）load

LOAD DATA 
[LOCAL] INPATH 'filepath' 
[OVERWRITE] INTO TABLE tablename 
[PARTITION (partcol1=val1, partcol2=val2 ...)];

关键字说明：

（1）local：表示从本地加载数据到Hive表；否则从HDFS加载数据到Hive表。

（2）overwrite：表示覆盖表中已有数据，否则表示追加。

（3）partition：表示上传到指定分区，若目标是分区表，需指定分区。

create table student(
    id int,
    name string
)
row format delimited fields terminated by '\t';

load data local inpath '/data/testData/student.txt' into table student;

“into”是追加的作用

overwrite：覆盖旧文件，否则加载新文件

load data local inpath '/data/testData/student.txt' overwrite into table student;

（2）insert

1.根据查询结果插入数据

2.将给定Values插入表中

insert into table  student1 values(1,'wangwu'),(2,'zhaoliu');

3.将查询结果写入目标路径

insert overwrite local directory '/opt/module/datas/student' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
select id,name from student;

（3） Export&Import

导出：

export table default.student to '/user/hive/warehouse/export/student';

导入：

import table student2 from '/user/hive/warehouse/export/student'

7.查询语句

（1）最基本的select from

select * from emp;
select empno, ename from emp;

(2)as:别名

(3)limit:限制返回行数

(4)Where语句

1）使用where子句，****将不满足条件的行过滤掉

2）where子句紧随from子句

--查询出薪水大于1000的所有员工。
select * from emp where salary > 1000;

--查询薪水大于1000，部门是30
select * from emp where (salary > 1000 and deptno = 30);

--查询薪水大于1000，或者部门是30
select * from emp where (salary > 1000 or deptno = 30);

--查询除了20部门和30部门以外的员工信息
select * from emp where deptno not in (20,30);

(5)聚合函数

count(*)，表示统计所有行数，包含null值；

count(某列)，表示该列一共有多少行，不包含null值；

max()，求最大值，不包含null，除非所有值都是null；

min()，求最小值，不包含null，除非所有值都是null；

sum()，求和，不包含null。

avg()，求平均值，不包含null。

--count
select count(*) from emp; --*代表所有的列

--求工资的最大值（max）
select max(salary) from emp;

--求工资的最小值（min）
select min(salary) from emp;

--求工资的总和（sum）
select sum(salary) from emp;

--求工资的平均值（avg）
select avg(salary) avg_salary from emp;

(6) Group By语句

通常和聚合函数配合使用

注意：selec的字段只能是group by之后的字段或者聚合函数

--group by
select count(*) as num_in_job from emp group by job;

--计算emp表每个部门的平均工资。
select deptno, avg(salary) as avg_salary from emp group by deptno;
select
    t.deptno,
    avg(t.salary) avg_sal
from emp t
group by t.deptno;

--计算emp每个部门中每个岗位的最高薪水。
select
    t.deptno,
    t.job,
    max(t.salary) as max_sal
from emp t
group by t.deptno,t.job;

select
    t.deptno,
    t.job,
    max(t.salary) max_sal
from emp t
group by t.deptno, t.job;

（7）having

1）having与where****不同点

（1）where后面不能写分组聚合函数，而having后面可以使用分组聚合函数。

因为where过滤的是表里一行一行的数据，而group by之后的数据是一组一组的

（2）having只用于group by分组统计语句。

--求平均薪水大于2000的部门
--先求平均薪水
select deptno, avg(salary) from emp where salary > 2000 group by deptno;
--having
select deptno, avg(salary) as avg_sal from emp 
where salary > 2000 
group by deptno 
having avg_sal > 2999;

（8）join

把多张表横向的拼接

--join
select e.empno, e.ename, d.dname from emp e join default.dept d on e.deptno = d.deptno;

join操作实际生成了虚拟表，包含了两张表的所有信息

一些例子

--join
select e.empno, e.ename, d.dname from emp e join default.dept d on e.deptno = d.deptno;

--合并员工表和部门表
select e.*, d.* from emp e join dept d on e.deptno = d.deptno;

--每一个locationu有多少员工
select d.loc, count(e.ename) from emp e join dept d on e.deptno = d.deptno group by d.loc;

1）内连接

左右表中都有某个键的才能join

select
    e.empno,
    e.ename,
    d.deptno
from emp e
inner join dept d
on e.deptno = d.deptno;

2）左外连接

select 
    e.empno, 
    e.ename, 
    d.deptno 
from emp e 
left join dept d 
on e.deptno = d.deptno;

包含左表的全部以及右表的部分

3）右外连接

select 
    e.empno, 
    e.ename, 
    d.deptno 
from emp e 
right join dept d 
on e.deptno = d.deptno;

4）全连接

select 
    e.empno, 
    e.ename, 
    d.deptno 
from emp e 
full join dept d 
on e.deptno = d.deptno;

5）多表连接

select *
from emp e
join dept d on e.deptno = d.deptno
join location l on d.loc = l.loc;

6）笛卡尔积

a的所有行×b的所有行

select 
    empno, 
    dname 
from emp, dept;

（9）union

select 
    *
from emp
where deptno=30
union
select 
    *
from emp
where deptno=40;

(10)排序

升序

select
    *
from emp
order by salary;

降序

select 
    * 
from emp 
order by sal desc;

标签： hive hadoop 数据仓库

本文转载自: https://blog.csdn.net/m0_63606851/article/details/135532062
版权归原作者 别再摆烂了！ 所有，如有侵权，请联系我们删除。