1、背景
W餐饮外卖平台向广大用户提供网上订餐服务,其市场占有量在近年不断增加。当用户在W平台订餐完成后,平台会引导用户对于品尝过的菜品进行评价打分,最高为5分,最低为1分。
通过用户的评分数据,可以分析外卖平台的受欢迎度、客户的体验度。
数据说明
- 用户评分数据(mealrating.txt)
属性名称
属性说明
UserID
用户ID
MealID
菜品ID
Rating
评分
ReviewTime
评分的时间戳
Review
评价内容
- 菜品数据集(meal_list.txt)
2、任务
- 将用户评分数据和菜品数据导入Hive
- 根据用户评分数据统计日销量和日用户量
select count(1) from mealrating where ReviewTime between 1496100000 and 1496200000;
2017年五月三十一日日销量
select count(DISTINCT UserID) from mealrating where ReviewTime between 1497100000 and 1497200000;
2017年六月十一号日用户量
- 统计同时有评分和评分内容的记录
select count(1) from mealrating where Rating is not null and Review is not null;
- 分析用户的评分分布情况
select
*,
cast(Rating/(sum(Rating) over () ) as DECIMAL (8,2)) as rat_percent
from (
select
Rating ,
count(1) rat_num ,
CAST(sum(Rating)/count(1) as DECIMAL(8,2)) avg_rat
from mealrating group by Rating
) as p ORDER BY rat_percent desc;
可以看到5分占比33%、4分27%、3分20%、2分13%、1分7%
- 统计10大热销菜品
select name,count(name) as frequency from mealrating join meal_list on mealrating.MealID=meal_list.MealID
group by name order by frequency desc limit 10;
- 统计10大评分为5的热销菜品
select name,Rating,count(name) as frequency from mealrating join meal_list on mealrating.MealID=meal_list.MealID
where Rating=5 group by name,Rating order by frequency desc limit 10;
- 统计单日评分超过2次的用户数
select count() from (select ReviewTime,UserID,count() from mealrating group by ReviewTime,UserID having count(*)>2) as tmp;
- 找出评分次数超过两次的用户中,每个用户评分最高的记录
select UserID,max(Rating) from mealrating group by UserID having UserID in (select UserID from mealrating group by UserID HAVING count(MealID)>2);
版权归原作者 听书人123 所有, 如有侵权,请联系我们删除。