文章目录
环境准备
- Hadoop 完全分布式(一主两从即可)
- MySQL环境、Hive环境
一、Hive 数据仓库的操作
验证 hadoop 集群、mysql服务均已启动:
**在
Hive
安装目录下的
bin
目录下启动:
./hive
:**
[root@server bin]# ./hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in[jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in[jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding isoftype[org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2and may not be available in the future versions. Consider using a different execution engine(i.e. spark, tez)orusing Hive 1.X releases.
hive>
1、创建数据仓库
// 小技巧:显示当前所使用的数据库名
hive>set hive.cli.print.current.db=true// 创建db数据库
hive (test)>createdatabaseifnotexists db;
OK
Time taken: 0.011 seconds
// 使用db数据库
hive (test)>use db;
OK
Time taken: 0.021 seconds
**在
Hive
中创建一个
db
数据仓库。在创建时,为了避免新建的库名与已有的库名重复,创建的时候添加
if not exists
如果库名重复将会报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db already exists
,**
2、查看 db 数据仓库的信息及路径
hive (db)>describedatabase db;
OK
db hdfs://192.168.64.183:9000/user/hive/warehouse/db.db root USER Time taken: 0.329 seconds, Fetched: 1row(s)
3、删除 db 数据仓库
// 删除数据仓库
hive (db)>dropdatabaseifexists db;
Sun Mar 0609:21:29 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set.For compliance with existing applications notusing SSL the verifyServerCertificate property issetto'false'. You need either to explicitly disable SSL by setting useSSL=false,orset useSSL=trueand provide truststore for server certificate verification.
Sun Mar 0609:21:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set.For compliance with existing applications notusing SSL the verifyServerCertificate property issetto'false'. You need either to explicitly disable SSL by setting useSSL=false,orset useSSL=trueand provide truststore for server certificate verification.
Sun Mar 0609:21:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set.For compliance with existing applications notusing SSL the verifyServerCertificate property issetto'false'. You need either to explicitly disable SSL by setting useSSL=false,orset useSSL=trueand provide truststore for server certificate verification.
Sun Mar 0609:21:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set.For compliance with existing applications notusing SSL the verifyServerCertificate property issetto'false'. You need either to explicitly disable SSL by setting useSSL=false,orset useSSL=trueand provide truststore for server certificate verification.
OK
Time taken: 0.438 seconds
// 再次查看所有数据仓库
hive (db)>showdatabases;
OK
default
test
Time taken: 0.025 seconds, Fetched: 2row(s)
二、Hive 数据表的操作
**Hive 的数据表分为两种:
内部表
和
外部表
。**
Hive 创建内部表的时候,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不会对数据的位置做出任何改变。
在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。这样外部表相对来说更加安全些,数据组织也更加灵活,方便共享源数据,生产中常使用外部表。
1、创建内部表
查看已存在的表(当前数据仓库中没有表):
hive (db)>showtables;
OK
Time taken: 0.026 seconds
**创建一个名为
cat
的内部表,包含有两个字段:
cat_id
、
cat_name
,字符类型为
string
:**
hive (db)>createtable cat(cat_id string,cat_name string);
OK
Time taken: 1.046 seconds
hive (db)>showtables;
OK
cat
Time taken: 0.02 seconds, Fetched: 1row(s)
**如果 cat 内部表已经存在,会报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table cat already exists)
**
2、创建外部表
**创建一个名为
cat1
的外部表,包含有两个字段:
cat_id
、
cat_name
,字符类型为
string
:**
// 创建外部表
hive (db)>create external tableifnotexists cat1(cat_id string,cat_name string)>row format delimited fieldsterminatedby'\t'> location '/usr/root/goods';
OK
Time taken: 0.315 seconds
// 查看表
hive (db)>showtables;
OK
cat
cat1
Time taken: 0.017 seconds, Fetched: 2row(s)
3、修改表结构
**修改
cat
表的表结构,对
cat
表添加两个字段
group_id
和
cat_code
:**
hive (db)>altertable cat addcolumns(group_id string,cat_code string);
OK
Time taken: 0.2 seconds
hive (db)>desc cat;// 查看表结构
OK
cat_id string
cat_name string
group_id string
cat_code string
Time taken: 0.042 seconds, Fetched: 4row(s)
**修改
cat1
表的表名,将其重命名为
cat2
:**
hive (db)>altertable cat1 renameto cat2;
OK
Time taken: 0.142 seconds
hive (db)>showtables;// 查看表
OK
cat
cat2
Time taken: 0.019 seconds, Fetched: 2row(s)
该命令可以修改表名,数据所在的位置和分区名并不改变!!!
4、删除表
**删除名为
cat2
的表:**
hive (db)>droptable cat2;
OK
Time taken: 0.337 seconds
hive (db)>showtables;
OK
cat
Time taken: 0.024 seconds, Fetched: 1row(s)
5、创建同结构表
**创建与已知表相同结构的表,创建一个与
cat
表相同结构的表,名为
cat3
,使用
like
关键字:**
hive (db)>createtable cat3 like cat;
OK
Time taken: 0.271 seconds
hive (db)>showtables;
OK
cat
cat3
Time taken: 0.018 seconds, Fetched: 2row(s)
hive (db)>desc cat3;
OK
cat_id string
cat_name string
group_id string
cat_code string
Time taken: 0.031 seconds, Fetched: 4row(s)
hive (db)>select*from cat3;
OK
Time taken: 1.599 seconds
通过查看表 cat3 的表结构及其表数据,可以看到 like 就是复制已有表的表结构。
版权归原作者 骑着蜗牛ひ追导弹' 所有, 如有侵权,请联系我们删除。