题目来自华为Talent官网《华为ICT大赛2023-2024实践赛云赛道真题实战课程》,如有侵权,请及时联系本人删除文章
非官方解析,仅供参考
截图只放题目要求的,命令另外写出来
3.2.5
任务一:MRS服务购买 略
任务2:数据上传
考点1:数据上传
a. 将数据文件上传到HDFS集群中的/user/chat目录下;
hdfs dfs -mkdir /user/chat
hdfs dfs -put chat_data.csv /user/chat
b. 上传完成后查看前五行数据。
hdfs dfs -cat /user/chat/chat_data.csv | head -5
任务3:数据上传至Hive
考点1:创建数据表
a. 在Hive中创建数据库,在ODS层创建表并将HDFS数据上传至Hive中。
create database ods_msg;
create database dws_msg;
create database ads_msg;
create table ods_msg.msg_source(
msg_time String,
sender_name String,
sender_account String,
sender_sex String,
sender_ip String,
sender_os String,
sender_phonetype String,
sender_network String,
sender_gps String,
receiver_name String,
receiver_ip String,
receiver_account String,
receiver_os String,
receiver_phonetype String,
receiver_network String,
receiver_gps String,
receiver_sex String,
msg_type String,
distance String)
row format delimited
fields terminated by ','
stored as textfile;
load data inpath '/user/chat/chat_data.csv' into table ods_msg.msg_source;
#截图:
a. show databases;
b. desc ods_msg.msg_source;
c. select * from ods_msg.msg_source limit 5;
任务4:数据ETL清洗操作
考点1:数据ETL清洗
a. 对GPS定位字段为空的数据进行过滤,然后通过时间字段构建天(格式为yyyy-MM-dd)和小时字段,并从GPS的经纬度中提取经度和纬度数据,最后将ETL以后的结果保存到DWS层中一张新的Hive表中。
create table dws_msg.msg_etl as
select
msg_time,
from_unixtime(unix_timestamp(msg_time), 'yyyy-MM-dd') as msg_day,
from_unixtime(unix_timestamp(msg_time), 'HH') as msg_hour,
sender_name,
sender_account,
sender_sex,
sender_ip,
sender_os,
sender_phonetype,
sender_network,
sender_gps,
split(sender_gps,' ')[0] as sender_latitude,
split(sender_gps,' ')[1] as sender_longitude,
receiver_name,
receiver_ip,
receiver_account,
receiver_os,
receiver_phonetype,
receiver_network,
receiver_gps,
split(receiver_gps,' ')[0] as receiver_latitude,
split(receiver_gps,' ')[1] as receiver_longitude,
receiver_sex,
msg_type,
distance
from ods_msg.msg_source
where sender_gps != '';
# 截图
a. select * from dws_msg.msg_etl limit 5;
任务5:指标计算
考点1:数据指标计算
a. 按日期统计每天总消息量,表名称:msg_cnt。
create table ads_msg.msg_cnt as
select
msg_day,
count(*) as msg_day_cnt
from dws_msg.msg_etl
group by msg_day;
b. 每小时消息量、发送量和接收用户数,表名称:msg_hour_cnt。
create table ads_msg.msg_hour_cnt as
select
msg_hour,
count(*) as msg_hour_cnt,
count(sender_account) as sender_msg_cnt,
count(receiver_account) as receiver_usr_cnt
from dws_msg.msg_etl
group by msg_hour;
c. 2023年1月1日发送和接收用户人数,表名称:msg_usr_cnt。
create table ads_msg.msg_usr_cnt as
select
msg_day,
count(sender_account) as sender_usr_cnt,
count(receiver_account) as receiver_usr_cnt
from dws_msg.msg_etl
where msg_day='2023-01-01'
group by msg_day;
d. 发送消息条数 top10 用户,表名称:msg_usr_top10。
create table ads_msg.msg_usr_top10 as
select
sender_account,
count(*) as sender_msg_cnt
from dws_msg.msg_etl
group by sender_account
order by sender_msg_cnt desc
limit 10;
e. 发送人手机型号分布,表名称:msg_sender_phone。
create table ads_msg.msg_sender_phone as
select
sender_phonetype,
count(*) as sender_phonetype_cnt
from dws_msg.msg_etl
group by sender_phonetype;
任务6:Hue可视化展示
登录MRS Manager,在服务管理中找到Hue服务,进入后点击Hue(Master)进入Hue的界面。在主界面上方Query Editor中选择 Hive。在空白处编写HQL语句,要求查看 ADS 层各表内容,并将指标数据通过图标展示。
a. msg_hour_cnt:柱状图
b. msg_usr_cnt:饼状图
这个查询应该是没错的,感觉是Hue的问题。
c. msg_usr_top10:柱状图
d. msg_sender_phone:饼状图
再次重申,本文非官方解析,仅供参考
如有侵权,请及时联系本人删除文章。
版权归原作者 Arknights Expert 所有, 如有侵权,请联系我们删除。