hive入门

一，MySQL入门使用

8088: Hadoop

打开：命令提示符程序，输入mysql -uroot -p 回车

查看数据库

show databases;

使用数据库

use 数据库名称;

创建数据库

create database 数据库名称 [charset utf8];

删除数据库

drop database 数据库名称;

查看当前使用的数据库

select database();

查看当前use的数据库

select current_database();

1，DDL-表管理

查看有哪些表 show tables；注意：需要选择数据库

1,创建表

create table 表名称(

    列名称 列类型,

    列名称 列类型，

    ........

);

-- 列类型有

int      -- 整数

float       -- 浮点数

varchar(长度)    -- 文本，长度为数字，做最大长度限制

date    -- 日期类型

timestamp    -- 时间戳类型

2,删除表

drop table 表名称;

drop table if exists 表名称;

3, 修改表

添加列

alter table table_name add 列名 列类型

修改列和属性

alter table table_name change 旧列名 新列名 新列类型 约束条件;

删除列

alter table table_name drop column 列名;

2，DML-数据操作

1,数据插入

普通插入

insert into 表[(列1，列2，.....,列N)] values（值1，值2，.......，值N）;

插入检索出来的数据

insert into 表(列, ...)
select 列,...
from 表名;

将一个表的内容插入到一个新表

create table New_Table as
select * from 表名;

更新(就是修改表数据)

update table_name 
set  列=值 [where 条件判断]

2,数据删除

delete from 表名称 
where 条件判断;

清空表

truncate table table_name

3,数据更新

update 表名 set 列=值 [where 条件判断];

3，DQL-数据查询

(1) 基础数据查询

select 字段列表|* from 表;

含义就是:

从（FROM）表中，选择（SELECT）某些列进行展示

select 字段列表|* from 表 where 条件判断

AND 和 OR 用于连接多个过滤条件。优先处理 AND，当一个过滤表达式涉及到多个 AND 和 OR 时，可以使用 () 来决定优先级，使得优先级关系更清晰。

IN 操作符用于匹配一组值，其后也可以接一个 SELECT 子句，从而匹配子查询得到的一组值。

NOT 操作符用于否定一个条件

(2) 分组聚合查询(group by)

select 字段|聚合函数 from 表 [where 条件] group by 列

聚合函数有：

    - sum(列) 求和
    - avg(列) 求平均值
    - min(列) 求最小值
    - max(列) 求最大值
    - count(列|*) 求数量

注意：group by中出现了哪个列，哪个列才能出现在select中的非聚合中.

having 过滤分组 - 先分组后过滤

select 列,count(*) as num
from 表名
where 条件语句
group by 列名
having 条件语句;

(3) 排序分页排序(order by,limit)

select 列|聚合函数|* from 表

where ...

group by ...

order by ... [ASC | DESC]

ASC 升序排列

DESC 降序配列

2, LIMIT 对查询结果进行数量限制或分页显示，

select 列|聚合函数|* from 表

where ...

group by ...

order by ... [ASC | DESC]

limit n[, m]

– n 限制五条数据

– 10, 5 从第10条数据开始，取5条

(4) 多表查询(from, inner join)

FROM多表- select … from 表1 [as 别名], 表2 [as 别名], …, 表N [where 连接条件];直接在FROM中写多个表，通过as可以给出表别名
INNER JOIN- select … from 表1 [as 别名1] [inner] join 表2 as 别名2 on 连接条件;inner join 内关联 (两个表的交集模式)left join 左外关联right join 右外关联

(5) 通配符(like)

% 匹配 >=0 个任意字符;
_ 匹配 == 1 个任意字符
[] 可以匹配集合内的字符，例如[ab] 将匹配a 或者 b。用^可以对其进行否定，也就是不匹配集合内的字符

select *
from table_name
where 列名 like '[^AB]%'; --不以 A 和 B 开头的任意文本

(6) 去重查询

distinct

select distinct 列名,列名
from 表名

4，约束

create table [if not exitsts] 表名(
    字段名1 类型[(宽度)] [约束条件] [comment '字段说明'],
    字段名2 类型[(宽度)] [约束条件] [comment '字段说明'],
    字段名3 类型[(宽度)] [约束条件] [comment '字段说明']
)['表的一些设置']

概念：

约束实际上就是表中数据的限制条件

(1), 主键约束(primary key)

– 主键的作用

– 主键约束的列非空且唯一

注意:

一张表只能有一个主键，联合主键也是一个主键

创建主键create table table_name( .... <字段名> <数据类型> primary key ....)``````create table table_name( .... [constraint <约束名>] primary key [字段名]);
联合主键注意: 联合主键，不能直接在字段名后面声明主键约束联合主键的每一列都不能为空create table table_name( ... [constraint <约束名>] primary key(字段1,字段2,....,字段n));
修改表结构添加主键create table table_name( ...);alter table <表名> add primary key(字段列表);``````create table table_name( ...);alter table <表名> add primary key(字段列表1, 字段列表2);
删除主键约束alter table table_name drop primary key;

(2), 自增长约束(auto_increment)

(3),非空约束(not null)

(4), 唯一性约束(unique)

(5), 默认约束(default)

(6), 零填充约束(zerofill)

(7), 外键约束 (foreign key)

– 添加外键

create table table_name(
    字段名 数据类型
    ....
    [constraint] [外键名称] foreign key(外键字段名) references 主表 (主表列名) 
);

alter table 表名 add consteraint 外键名称 foreign key(外键字段名) references 主表(主表列名)

二，Hive

两部分组成元数据管理服务(metastore),SQL解释器

启动元数据管理服务(必须启动，否则无法工作)
退出安全模式- hdfs dfsadmin -safemode leave
注：在hive文件夹下启动前台启动: bin/hive --service metastore后台启动: nohup bin/hive --service metastore >> logs/metastore.log 2>&1 &
启动客户端- Hive Shell方法(可以直接写SQL)- bin/hive- Hive ThriftServer方式(不可以直接写SQL，需要外部客户端链接使用)- bin/hive --service hiveserver2- 后台执行脚本:nohup bin/hive --service hiveserver2 >> logs/hiveserver2.log 2>&1 &

1，Hive体验

可以执行:bin/hive,进入到Hive Shell环境中，可以直接执行SQL语句

创建表create table test(id int, name string, gender string);
插入数据insert into test values(1, '王力红','男')
查询数据select gender,count(*) as cnt from test group by gender;
HIve中创建的库和表的数据,存储在HDFS中，默认存放在:

hdfs://node1:8020/user/hive/warehouse中

内置服务

bin/beeline!connect jdbc:hive2://node1:10000

2，Hive数据库操作语法

(1)，数据库操作

(1.1),创建数据库

create database if not exists myhive;

use myhive;

(1.2),查看数据库详细信息

desc database myhive;

(1.3),创建数据库并指定hdfs存储位置

create database myhive2 location '/myhive2';

(1.4),删除数据库,如果数据库下面有数据表，那么就会报错

drop database myhive;

(1.5),强制删除数据库,包含数据库下面的表一起删除

drop database myhive2 cascade;

hive数据库的本质是一个文件夹
默认存储在：/user/hive/warehouse内
可以通过location关键字指定在创建的时候指定存储目录

(2)，数据表操作

(2.1),创建表

create [external] table [if not exists] table_name

    [col_name data_type [comment col-comment], .....]

    [comment table_comment]

    [partitioned by (col_name data_type [comment col_comment], ....)]

    [clustered by (col_name, col_name, ....)]

    [sorted by (col_name [asc|desc], ....)] into num_buckets BUCKETS]

    [row format row_format]

    [stored as file_format]

    [location hdfs_path]

(2.2),内部表和外部表

内部表(create table table_name …)直接删除内部表会直接删除元数据(metadata)及存储数据，因此内部表不适合和其他工具共享数据
外部表(create external table table_name … location)外部表是指表数据可以在任何位置，通过location关键字指定。数据存储的不同也代表了这个表并不是hive内部管理的，而是可以随意临时链接到外部数据上的。所有，在删除外部表的时候，仅仅删除元数据(表的信息),不会删除数据本身- 创建外部表create external table test_ext1(id int,name string) row format delimited fields terminated by '\t' location '地址'
自行指定分割符- row format delimited fields terminated by ‘\t’ 表示以\t分割
查看表详细类型desc formatted stu;
表转换alter table 表 set tblproperties('EXTERNAL'='TRUE');

(2.3),数据加载

建表

create table 表名(

        dt string comment '时间(时分秒)',

        user_id string comment '用户ID',

        word string comment '搜索词',

        url string comment '用户访问网址'

) comment '搜索引擎日志表' row format delimited fields terminated by '\t'

两种加载方式- 本地加载(linux)load data local inpath '路径' into table 表;- HDFS加载load data inpath '路径' into table 表;

(2.4),数据加载 insert select

insert [overwrite|into] table tablename1 [if not exists] select statement1 from 表

* overwrite 覆盖
* into 追加

(2.5), 数据导出

导出到本地

insert overwrite local directory '路径' select * from 表名;

指定分隔符

insert overwrite local directory '路径' row format delimited fieds terminated by'\t' select * from 表名;

导出到HDFS

insert overwrite directory '路径' row format delimited fied terminated by '\t' select * from 表名;

区别– 带local 写入本地– 不带local 写入hdfs

(2.6), 分区表

分区其实就是HDFS上的不同文件夹

分区表可以极大的提高特点场景下Hive的操作性能

基本语法create table 表名(......) partitioned by (分区列列类型, .....)row format delimited fields terminated by '\t';
加载数据load data [local] inpath 'path' into table 表 partition(分区列='', ...)

(2.7), 分桶表

开启分桶表的自动优化set hive,enforce.bucketing=true
创建分桶表create table 表名(列名列类型, ....) clustered by(列名) into 3 buckets row format delimited fields terminated by '\t';
桶表的数据加载，只能通过insert select- 创建一个临时中转表- 向中转表load data数据- 从中转表进行insert select向分桶表加载数据insert overwrite table 表名 select * from 表名 cluster by(列名);

(2.8), 修改表

表重命名alter table 旧表名 rename to 新表名
修改表属性值alter table table_name set tblproperties(''='')
添加表分区alter table table_name add partition(分区)- 修改分区值alter table table_name partition(分区) rename to partition(分区)
添加新列alter table table_name add colums(列名列类型, ...)- 修改列名alter table table_name change 旧列名新列名列类型;
删除表drop table table_name;
清空表数据(无法清空外部表)truncate table table_name;

(2.9)array数组

建表语句create table table_name(work_locations array<string>)row format delimited fields terminated by '\t'collection items terminated by ',';

– 查询array类型中的元素个数

select name, size(array列名) from table_name;

– 查询array中包含的数据

select * from table_name where array_contains(array列名, 要查询的数据名称)

(2.10) map类型

建表语句create table table_name(members map<string,string>)row format delimited fields terminated by ',' collection items terminated by '#'map keys terminated by ';';``````create table table_name( id int, name string, menbers map<string, string>, age int)row format delimited fields terminated by ','collection items terminated by '#'map keys terminated by ':';# collection items terminated by '#' 每个键值对之间的分隔符# map keys terminated by ':' 单个键值对内部，k和v的分隔符
取出map的全部key，返回类型是arrayselect map_keys(map数据名) from table_name;
取出map的全部values, 返回类型是arrayselect map_values(map数据名) from table_name;
size 查看map的元素(K-V对)的个数select size(map数据名) from table_name;

(2.11), struct类型

– struct类型是一个复合类型，可以在一个列中存入多个子列，每个子列允许设置类型和名称

在这里插入图片描述

建表语句create table table_name( id string, info struct<name:string,age:int>)row format delimitedfields terminated by '#'collection ifems terminatted by ':';

(3)，数据查询

(4)，函数

标签： hive hadoop 数据仓库

本文转载自: https://blog.csdn.net/iku_n/article/details/142024761
版权归原作者 库库林_沙琪马 所有，如有侵权，请联系我们删除。