0 需求
现有一个用户活跃表user_active(user_id,active_date)、用户注册表user_regist(user_id,regist_date),
表中分区字段都为dt(yyyy-MM-dd),用户字段均为user_id;
设计一张1-180天注册活跃留存表;
1 分析
需求中要求设计留存周期为1-180天的注册、活跃留存表,即目标如下:
注册日期
留存周期
活跃数
注册数
留存率
2023-01-10
1
100
200
活跃数/注册数
2023-01-10
2
50
200
2023-01-10
3
10
200
.....
...
...
主要考察点:笛卡尔集(一对多关联)
观察表结构:我们可以看出,针对每一天来讲分母固定,分子随留存周期变动
第一步:求出注册表中,每日注册数,注册数作为分母,针对每一天来讲是个固定值,因我们用开窗求解该指标。
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
第二步:用户注册表作为主表,关联活跃表,关联健为user_id,由于一对多的关系,产生笛卡尔集
注意:活跃用户表,每天用户会有多次活跃注意去重
select regist_date
,t1.user_id
,t1.regist_count
,t2.user_id
,t2.active_date
,datediff(t2.active_date, t1.regist_date) as date_diff
from (
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
) t1
left join (
select user_id
,to_date(active_date) as active_date
from user_active
where dt >= date_sub(current_date(), 180)
group by user_id, to_date(active_date)
) t2
on t1.user_id = t2.user_id
regist_date
t1.user_id
t1.regist_count
t2.user_id
t2.active_date
date_diff
2023-01-10
A
200
A
2023-01-11
1
2023-01-10
A
200
A
2023-01-12
2
2023-01-10
A
200
A
2023-01-13
3
2023-01-10
A
200
A
2023-01-14
4
2023-01-10
B
200
B
2023-01-13
3
2023-01-10
B
200
B
2023-01-14
4
2023-01-10
B
200
B
2023-01-15
5
2023-01-10
B
200
B
2023-01-16
6
第三步:按照注册日期和留存周期分组,计算该留存周期下、该时间点下的活跃用户数
select t1.regist_date
,max(t1.regist_count) as regist_cnt --每天是固定值,用max()函数取出该值
,datediff(t2.active_date, t1.regist_date) as date_diff
,count(t1.user_id) as active_user_cnt
from (
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
) t1
left join (
select user_id
,to_date(active_date) as active_date
from user_active
where dt >= date_sub(current_date(), 180)
group by user_id, to_date(active_date)
) t2 on t1.user_id = t2.user_id
where datediff(t2.active_date, t1.regist_date) >=1
and datediff(t2.active_date, t1.regist_date) <= 180
group by t1.regist_date, datediff(t2.active_date, t1.regist_date)
第四步:计算率留存
select regist_date
, date_diff
, active_user_cnt
, case when nvl(regist_cnt,0)!=0
then active_user_cnt/regist_cnt end as retention_rate
from
(select t1.regist_date
,max(t1.regist_count) as regist_cnt --每天是固定值,用max()函数取出该值
,datediff(t2.active_date, t1.regist_date) as date_diff
,count(t1.user_id) as active_user_cnt
from (
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
) t1
left join (
select user_id
,to_date(active_date) as active_date
from user_active
where dt >= date_sub(current_date(), 180)
group by user_id, to_date(active_date)
) t2 on t1.user_id = t2.user_id
where datediff(t2.active_date, t1.regist_date) >=1
and datediff(t2.active_date, t1.regist_date) <= 180
group by t1.regist_date, datediff(t2.active_date, t1.regist_date)
) t
2 小结
本文给出了一种1-180天注册活跃留存表计算模型,主要采用笛卡尔集的形式求解,这也是数据报表经常采用的方法,需要掌握。
版权归原作者 莫叫石榴姐 所有, 如有侵权,请联系我们删除。