HIVE学习笔记–Linux命令篇

【HIVE体验】

jps(可以看到Runjar确保启动metastore
创建表
create table test (id int,name string);
默认存在default数据库
插入数据
insert into test values(1,‘tom’),(2,‘jerry’)
查询数据
select name,count(*)as cnt from test group by name
本质上HIVE操作的还是hdfs中的文件默认在HIVE/warehouse，在mysql中存的都是元数据
启动HIVE
bin/hive --service metastore
启动hiveserver2服务
bin/hive --service hiveserver2
后台启动
nohup in/hive --service hiveserver2 >> logs.log 2>&1 &
第三方客户端beeline连接hive，还有例如datagrip,dbeaver
！ connect jdbc:hive2://node1:10000

【HIVE基本语法】

查看数据库信息
desc database myhive;
创建数据库的时候指定存储位置
create databse myhive2 location ‘/myhive2’;
删除数据库,可以通过cascade强制删除包含表的数据库
drop database myhive;
数据表操作
create table test (id int,name string,gender string);
建表的时候指定分隔符
create table test()row format delimited fields teminated by ‘/t’;
创建外部表,指定到文件夹级别查看表
create external table test()row format delimited fields teminated by ‘/t’ location ‘/tmp/test’;
查看表类型
desc formatted table_name;
内部表改成外部表
alter table table_name set tblproperties(‘EXTERNAL’=‘TRUE’);

【HIVE数据导入和导出】

从文件中加载，速度快，本质是文件的移动，源文件会被移动
load data [local] inpath [overwrite] into table databse.table_name
数据加载-insert select语法,这个用法会启动MapReduce
insert [overwrite|into] table tablename1 [partition] [if not exists] select name from name_table;
数据导出-insert overwrite
insert overwrite [local] directory [path] select * from table_name;
制定导出的文件的分隔符
insert overwrite [local] directory [path] row format delimited fields terminated by ‘/t’ select * from table_name;
hive表导出-hiveshell
/hive -e ‘selelct * from table_name’ > test.txt
/hive -f export.sql > export.txt

【分区表】

创建一个单分区表，按照月份分区
create table test() partitioned by (month string) row format delimited fields teminated by ‘/t’;
加载数据到分区表
load data [local] inpath [overwrite] into table databse.table_name partition(month=‘202005’);
创建一个多分区的表，按照年月日
create table test() partitioned by (year string,month string,day string) row format delimited fields teminated by ‘/t’;
load data [local] inpath [overwrite] into table databse.table_name partition(year=‘2022’,month=‘05’,day=‘10’);

【分桶表】

分区试讲表放在不同文件夹存储，分桶就是将表拆分到固定数量的不同文件中进行存储
开启分桶表的自动优化
set hive.enforce.bucketing=true;
创建分桶表
create table test() clustered by (id) into 3 buckets row format delimited fields terminated by ‘/t’;
分桶表数据加载不能用load data 执行，只能通过insert select,并且先创建一个中转表
向中转表load数据后再insert到分桶表中
create table test (id string,name string);
load data local inpath ‘/test’ into table test_temp
insert overwrite table test select * from test_temp cluster by (id);
数据的划分基于分桶列的值进行hash取模决定，因为load不会触发MapReduce计算，所以无法执行hash算法，只是简单的进行数据移动，所以不能用于分桶表数据插入

【表操作】

修改表名
alter table old_name rename to new_name;
修改表属性值
alter table table_name set tblproperties(‘EXTERNAL’=‘FALSE’);
添加分区
alter table tablename add partition(month=‘201101’);
修改分区值
alter table tablename partition(month=‘200205’) rename to partition(month=‘200305’);
删除分区
alter table tablename drop partition(month=‘201105’);
添加列
alter table table_name add columns(v1 int);
修改列名但是不能修改列的数据类型
alter table test_change change v1 v1new int;
清空表数据
truncate table test;
外部表无法执行清空操作，没有管理权限

【复杂数据类型-数组】

array collection items terminated by ','代表数组存储的分隔符
create table test(name array)
row format delimited fields terminated by ‘/t’
collection items terminated by ‘,’;
取数select name[0] from test;
select name from test where ARRAY_CONTAINS(name,‘tom’)；条件返回

【复杂数据类型-map映射】

key-value型数据
create table test(id int,name map<string,string>)
row format delimited fields terminated by ‘/t’
collection items terminated by ‘,’
map keys teminated by ‘:’;
id name
1 father:Tom,mather:Lucy,brother:Jim
查看每个人父亲名字
select id ,name[‘father’] from test;
取出全部key值,返回数据类型是数组
select map_keys(name) from test;
查询每一组map的大小
select size(name) from test;
查看指定数据是偶包含
select * from test where ARRAY_CONTAINS(map_keys(name),‘father’);
select * from test where ARRAY_CONTAINS(map_values(name),‘tom’);

【复杂数据类型-struct结构】

create table test(id int,info structname:string,age:int)
row format delimited fields terminated by ‘/t’
collection items terminated by ‘#’；
id info
1 Tom#12
2 Jerry#13
查询内容
select id,info.name from test;

【hive SQL基本查询】

特有关键字cluster by ,distribute by,sort by
过滤广东省的订单
select * from test where useraddress like ‘%广东%’；
找出广东省单笔营业额最大的
select * from test where useraddress like ‘%广东%’ order by totalmoney desc limit 1；
统计未支付和已支付的人数
select count(*) as cnt from test group by ispay;
已付款的订单中，统计每个用户最高消费额
select user_id,max(totalmoney) from test where ispay=1 group by userid;
每个用户的平均消费金额
select user_id,avg(totalmoney) from test group by userid;
统计平均值大于10000的
select user_id,avg(totalmoney) as avg from test group by userid having avg>10000;

【hive sql RLIKE正则匹配】

hive正则表达式查询表-来源【B站-黑马程序员】下一篇 hive-简单实战案例篇敬请期待

标签： hive 学习笔记

本文转载自: https://blog.csdn.net/little_TianYe/article/details/144258163
版权归原作者 兔子宇航员0301 所有，如有侵权，请联系我们删除。

HIVE学习笔记–Linux命令篇

【HIVE体验】

【HIVE基本语法】

【HIVE数据导入和导出】

【分区表】

【分桶表】

【表操作】

【复杂数据类型-数组】

【复杂数据类型-map映射】

【复杂数据类型-struct结构】

【hive SQL基本查询】

【hive sql RLIKE正则匹配】

发表评论

“HIVE学习笔记–Linux命令篇”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航