0


华为ICT大赛2023-2024全国总决赛-实践赛-云赛道实验试题-大数据部分个人解析

题目来自华为Talent官网《华为ICT大赛2023-2024实践赛云赛道真题实战课程》,如有侵权,请及时联系本人删除文章

非官方解析,仅供参考

截图只放题目要求的,命令另外写出来

3.2.5

任务一:MRS服务购买 略

任务2:数据上传

考点1:数据上传

a. 将数据文件上传到HDFS集群中的/user/chat目录下;
hdfs dfs -mkdir /user/chat
hdfs dfs -put chat_data.csv /user/chat
b. 上传完成后查看前五行数据。
hdfs dfs -cat /user/chat/chat_data.csv | head -5

任务3:数据上传至Hive

考点1:创建数据表

a. 在Hive中创建数据库,在ODS层创建表并将HDFS数据上传至Hive中。
create database ods_msg;
create database dws_msg;
create database ads_msg;

create table ods_msg.msg_source(
msg_time String,
sender_name String,
sender_account String,
sender_sex String,
sender_ip String,
sender_os String,
sender_phonetype String,
sender_network String,
sender_gps String,
receiver_name String,
receiver_ip String,
receiver_account String,
receiver_os String,
receiver_phonetype String,
receiver_network String,
receiver_gps String,
receiver_sex String,
msg_type String,
distance String)
row format delimited
fields terminated by ','
stored as textfile;

load data inpath '/user/chat/chat_data.csv' into table ods_msg.msg_source;

#截图:
a. show databases;
b. desc ods_msg.msg_source;
c. select * from ods_msg.msg_source limit 5;

任务4:数据ETL清洗操作

考点1:数据ETL清洗

a. 对GPS定位字段为空的数据进行过滤,然后通过时间字段构建天(格式为yyyy-MM-dd)和小时字段,并从GPS的经纬度中提取经度和纬度数据,最后将ETL以后的结果保存到DWS层中一张新的Hive表中。
create table dws_msg.msg_etl as
select
msg_time,
from_unixtime(unix_timestamp(msg_time), 'yyyy-MM-dd') as msg_day,
from_unixtime(unix_timestamp(msg_time), 'HH') as msg_hour,
sender_name,
sender_account,
sender_sex,
sender_ip,
sender_os,
sender_phonetype,
sender_network,
sender_gps,
split(sender_gps,' ')[0] as sender_latitude,
split(sender_gps,' ')[1] as sender_longitude,
receiver_name,
receiver_ip,
receiver_account,
receiver_os,
receiver_phonetype,
receiver_network,
receiver_gps,
split(receiver_gps,' ')[0] as receiver_latitude,
split(receiver_gps,' ')[1] as receiver_longitude,
receiver_sex,
msg_type,
distance
from ods_msg.msg_source
where sender_gps != '';

# 截图
a. select * from dws_msg.msg_etl limit 5;

任务5:指标计算

考点1:数据指标计算

a. 按日期统计每天总消息量,表名称:msg_cnt。
create table ads_msg.msg_cnt as
select 
msg_day,
count(*) as msg_day_cnt 
from dws_msg.msg_etl 
group by msg_day;
b. 每小时消息量、发送量和接收用户数,表名称:msg_hour_cnt。
create table ads_msg.msg_hour_cnt as
select
msg_hour,
count(*) as msg_hour_cnt,
count(sender_account) as sender_msg_cnt,
count(receiver_account) as receiver_usr_cnt
from dws_msg.msg_etl 
group by msg_hour;
c. 2023年1月1日发送和接收用户人数,表名称:msg_usr_cnt。
create table ads_msg.msg_usr_cnt as
select
msg_day,
count(sender_account) as sender_usr_cnt,
count(receiver_account) as receiver_usr_cnt
from dws_msg.msg_etl
where msg_day='2023-01-01'
group by msg_day;
d. 发送消息条数 top10 用户,表名称:msg_usr_top10。
create table ads_msg.msg_usr_top10 as
select
sender_account,
count(*) as sender_msg_cnt
from dws_msg.msg_etl
group by sender_account
order by sender_msg_cnt desc
limit 10;
e. 发送人手机型号分布,表名称:msg_sender_phone。
create table ads_msg.msg_sender_phone as
select 
sender_phonetype,
count(*) as sender_phonetype_cnt 
from dws_msg.msg_etl
group by sender_phonetype;

任务6:Hue可视化展示

登录MRS Manager,在服务管理中找到Hue服务,进入后点击Hue(Master)进入Hue的界面。在主界面上方Query Editor中选择 Hive。在空白处编写HQL语句,要求查看 ADS 层各表内容,并将指标数据通过图标展示。

a. msg_hour_cnt:柱状图

b. msg_usr_cnt:饼状图

这个查询应该是没错的,感觉是Hue的问题。

c. msg_usr_top10:柱状图

d. msg_sender_phone:饼状图

再次重申,本文非官方解析,仅供参考

如有侵权,请及时联系本人删除文章。

标签: 华为 大数据 linux

本文转载自: https://blog.csdn.net/Quin22/article/details/140633144
版权归原作者 Arknights Expert 所有, 如有侵权,请联系我们删除。

“华为ICT大赛2023-2024全国总决赛-实践赛-云赛道实验试题-大数据部分个人解析”的评论:

还没有评论