flink sql实现分组聚合有多种方式。
第一种,GROUP BY
将需要分组的字段放在group by子句即可,把时间字段放在group by下可实现开窗的功能。
SELECT COUNT(*)
FROM Orders
GROUP BY order_id
第二种,GROUP BY GROUPING SETS
GROUP BY GROUPING SETS ((supplier_id, rating), (supplier_id), ())
第三种,GROUP BY ROLLUP
GROUP BY ROLLUP (supplier_id, rating)效果同上。
第四种,GROUP BY CUBE
GROUP BY CUBE (supplier_id, rating, product_id)
会对cube的字段的各种组合进行分组聚合
分组开窗操作
SQL中只支持基于时间的窗口
第一种:将窗口放在groupby子句下
这种方式在1.13后被标记为过时。
滚动窗口:TUMBLE(time_attr, interval)
滑动窗口:HOP(time_attr, interval, interval)
会话窗口:SESSION(time_attr, interval)
例子:
SELECT
user,
TUMBLE_START(order_time, INTERVAL ‘1’ DAY) AS wStart,
SUM(amount) FROM Orders
GROUP BY
TUMBLE(order_time, INTERVAL ‘1’ DAY),
user
第二种:Window TVF Aggregation(更强大)
支持GroupingSets语法,可以在window聚合中使用TopN,提供累积窗口
滚动:SELECT window_start, window_end, SUM(price)
FROM TABLE(
TUMBLE(TABLE 表名, DESCRIPTOR(时间字段), INTERVAL ‘10’ MINUTES))
GROUP BY window_start, window_end;
滑动:SELECT window_start, window_end, SUM(price)
FROM TABLE(
HOP(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL ‘5’ MINUTES, INTERVAL ‘10’ MINUTES))
GROUP BY window_start, window_end;
累积窗口:相当于在滚动窗口内周期性触发计算。
SELECT window_start, window_end, SUM(price)
FROM TABLE(
CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL ‘2’ MINUTES, INTERVAL ‘10’ MINUTES))
GROUP BY window_start, window_end;
OVER聚合
OVER聚合为每个输入行生成一个聚合值
下面的查询为每个订单计算在当前订单之前一小时内收到的同一产品的所有订单的金额总和:
SELECT order_id, order_time, amount,
SUM(amount) OVER (
PARTITION BY product
ORDER BY order_time
RANGE BETWEEN INTERVAL ‘1’ HOUR PRECEDING AND CURRENT ROW
) AS one_hour_prod_amount_sum
FROM Orders
TOPN聚合
SELECT [column_list]
FROM (
SELECT [column_list],
ROW_NUMBER() OVER ([PARTITION BY col1[, col2…]]
ORDER BY col1 [asc|desc][, col2 [asc|desc]…]) AS rownum
FROM table_name)
WHERE rownum <= N [AND conditions]
Deduplication:特殊的TOPN,排序字段一定是时间属性列,相比TOPN性能更好。
SELECT [column_list]
FROM (
SELECT [column_list],
ROW_NUMBER() OVER ([PARTITION BY col1[, col2…]]
ORDER BY time_attr [asc|desc]) AS rownum
FROM table_name)
WHERE rownum = 1
版权归原作者 cheng_2022 所有, 如有侵权,请联系我们删除。