Hive 与 Hbase表映射（内部表与外部表），Hbase常用命令

应用场景
1.将ETL操作的数据存入HBase

2.Hbase作为Hive的数据源

构建低延迟的数据仓库

Hive表映射至Hbase
Hbase上有表 -> 外部表
创建Hive表映射HBase原有的表，实现HBase表更新后，Hive能获取到更新后的结果
创建外部表：hbase中有表有数据，hive中没有，hbase会充当数据源
hbase必须有相应的表 hbase中有表无数据，hive中有数据，将分析的结果存入hbase

Hbase上没有表 -> 内部表
创建Hive表映射HBase表，可以实现将Hive ETL后的结果被HBase访问
创建内部表：hbase中自动创建表，有表无数据，目的，将hive中的分析结果存入hbase。
hbase自动创建表 drop hive数据，hbase数据消失

修改hive配置添加如下：

linux>vi /opt/install/hive/conf/hive-site.xml

<property>
        <name>hbase.zookeeper.quorum</name>
        <value>192.168.58.201:2181,192.168.58.202:2181,192.168.58.203:2181</value>
</property>

linux>bin/hiveserver2
linux>bin/beeeline
beeline>!connect jdbc:hive2://IP:10000 user pwd

hbase创建表时报错，提示已存在但list查看却没有

hbase(main):021:0> create 'test','base'

ERROR: Table already exists: test!

解决办法进入zookeeper操作界面删除hbase/table/tablename即可

[zk: localhost:2181(CONNECTED) 2] rmr /hbase/table/test  //表名
[zk: localhost:2181(CONNECTED) 3] ls /hbase/table  //检查

[root@NameNode ~]# hbase shell
hbase(main):001:0> create 'test','base' 
0 row(s) in 2.7530 seconds
hbase(main):001:0> list    //查看
TABLE                                                                                                                                                                     
test                                                                                                                                                                 
1 row(s) in 0.0190 seconds

=> ["test"]

linux准备数据

linux>vi test.csv 
emp_no,name,gender,birth_date
1,lisi,M,1997-09-15
2,zhangsan,M,1997-09-15
3,wangwu,F,1997-09-18

hive创建并加载表（外部表）

hive>create external table test(
    emp_no string,
    name string,
    gender string,
    birth_date date
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile
tblproperties("skip.header.line.count"="1");
//tblproperties("skip.header.line.count"="1") //跳过第1行

hive>load data local inpath '/root/tmp_data/test.csv' into table test;

注意上面如果创建完后，删除表 test 后重新创建数据还会存在，如果想要不存在原来的数据需要删除hdfs上的文件如下：

linux>hdfs dfs -rmr /user/hive/warehouse/test

创建表（外部表）映射至hbase中

hive>create external table hb_test(  
    emp_no string,
    name string,
    gender string,
    birth_date date
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with
serdeproperties
(
    "hbase.columns.mapping"=":key,base:name,base:gender,base:hire_date"
)
tblproperties("hbase.table.name"="default:test");
--tblproperties("hbase.table.name"="default:test") 中test要和hbase中对应
--如果报错，在hbase中建好表试试    （注意）

hbase >create 'test','base'  （注意）

#hive 表数据导入到hbase关联表
hive>from test
insert into table hb_test
select  * ;

hbase 查看数据

hbase(main):002:0> sacn 'test'    //查看数据
hbase(main):004:0> count 'test'   //统计数据

hive hb_test 外部表（hb_test）添加数据hbase内的表(test)也会添加但hive表删除不会影响hbase内的表（test）

hive>insert into table test  values(4,'lisit2','F','1997-09-15');
hbase(main):008:0> count 'test'
4 row(s) in 0.0900 seconds

=> 4
hive>drop table hb_test;
hbase(main):009:0> count 'test'
4 row(s) in 0.0180 seconds

=> 4

hive 创建（内部表）并导入数据

hive>create  table i_test(
    emp_no string,
    name string,
    gender string,
    birth_date date
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with
serdeproperties
(
   "hbase.columns.mapping"=":key,base:name,base:gender,base:hire_date"
)
tblproperties("hbase.table.name"="default:testt")

#hive表导入数据到 i_test内部表
hive>from test
insert into table i_test
select  * ;

hbase 查看数据

hbase(main):012:0> scan 'testt'
hbase(main):013:0> count 'testt'

如果删除hbase内的表(testt)或者删除hive内的表（i_test）都后关联对方，会自动也跟着删除

hive>drop table i_test;
或者
hbase>disable 'testt'
hbase>drop 'testt'  //必须先执行disable才能drop

不经过hive直接导入数据到hbase中

命令在linux下执行，file指定的文件，在所有电脑上都要存在，不然会报错，需要在hbase上提前创建好表

#需要提前在hbase上创建好表否则会报错
hbase>create 'table_name','base'
linux>scp -r 文件位置 IP:文件存放位置

linux>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator=,  \
-Dimporttsv.columns="HBASE_ROW_KEY,base:name,base:gender,base:hire_date" \
table_name  file:///root/tmp_data/test.csv
hbase> count 'table_name'

Hbase常用操命令：

HBase Shell操作连接集群
hbase shell
创建表
create 'table_name','base_info'
向表中添加数据
put 'table_name', 'rowkey_1', 'base_info:username', 'lisi'

put 'table_name','rowkey_2','base_info:username','zhangsan'
put 'table_name','rowkey_2','base_info:birthday','2020-05-17'

查询表中的所有数据
scan 'table_name'
查询某个rowkey的数据
get 'table_name', 'rowkey_1'
查询某个列簇的数据
get 'table_name', 'rowkey_1', 'base_info'
get 'table_name', 'rowkey_1', 'base_info:username'
get 'table_name', 'rowkey_1', {COLUMN => ['base_info:username', 'base_info:birthday']}
删除表中的数据
delete 'table_name', 'rowkey_1', 'base_info:username'
清空数据
truncate 'table_name'
操作列簇
alter 'table_name'， NAME => 'f2'
alter 'table_name', 'delete' => 'f2'
删除表
要先disable 表，然后才能删除表否则报错
disable 'table_name'
drop 'table_name'

11.查看所有的hbase的shell命令
help
寻求指定命令的使用方法
help 'xxx' 例 help 'put'

12.查看表结构
desc 'table_name'

13.namespace ：命名空间，理解为Java中的包
hbase中的表看成是Java中的类
换句话说，namespace就是保存表的一个逻辑上的路径

namespace的常用操作命令：

alter_namespace    修改命名空间的属性
create_namespace    创建命名空间
describe_namespace    查看命名空间的结构
drop_namespace    删除命名空间
list_namespace    查看HBase中所有的命名空间
list_namespace_tables    查看指定的命名空间中的所有表

14.其余一些Hbase命令

create    创建表
alter    修改列族模式
count    统计表中的行数量
disable    使表无效
drop    删除表
delete    删除指定对象值(可以是表、行、列对应的值)
describe    显示表的相关详细信息
deleteall    删除指定行所有的元素
exit    退出Hbase shell
enable    使表有效
exists    检查表是否存在
list    显示出Hbase中所有存在的表
incr    增加指定表、行、列的值
put    向指定的表单元添加值
get    获取行或单元的值
tools    显示Hbase所有支持的工具
version    返回Hbase版本信息
truncate    重新创建指定表
scan    扫描表获取对应的值
shutdown    关闭Hbase群集（与exit不同）
status    返回Hbase集群的状态信息

#删除默认删除光标后面的字符，如果要删除面前的字符需要ctrl+删除键

标签： hbase hive 大数据

本文转载自: https://blog.csdn.net/Mogeko1/article/details/127647544
版权归原作者 房石阳明i 所有，如有侵权，请联系我们删除。

Hive 与 Hbase表映射（内部表与外部表），Hbase常用命令

发表评论

“Hive 与 Hbase表映射（内部表与外部表），Hbase常用命令”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航