窗口函数
hive文档链接
spark文档链接
1. OVER支持的函数
- 自然序编号 Syntax:
ROW_NUMBER
- 按等级编号 Syntax:
RANK
|DENSE_RANK
|PERCENT_RANK
- 分组内分桶,并返回对应桶的序号 Syntax:
NTILE(n)
- Analytic Functions(分析函数) Syntax:
CUME_DIST
|LAG
|LEAD
|NTH_VALUE
|FIRST_VALUE
|LAST_VALUE
- Aggregate Functions(聚合函数) Syntax:
MAX
|MIN
|COUNT
|SUM
|AVG
| …
1.1. 准备工作
创建测试表并插入测试数据
CREATETABLE employees (name STRING, dept STRING, salary INT, age INT);INSERTINTO employees VALUES("Lisa","Sales",10000,35),("Evan","Sales",32000,38),("Fred","Engineering",21000,28),("Alex","Sales",30000,33),("Tom","Engineering",23000,33),("Jane","Marketing",29000,28),("Jeff","Marketing",35000,38),("Paul","Engineering",29000,23),("Chloe","Engineering",23000,25);
1.2. row_number()
row_number() over()
row_number可能是窗口函数中使用最频繁的函数,作用是在分组内按自然序进行编号,结果值为:1 2 3 4 5
select*,row_number()over(partitionby dept orderby salary)as rn from employees;+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|2|| Fred | Engineering |21000|28|1|| Tom | Engineering |23000|33|2|| Chloe | Engineering |23000|25|3|| Paul | Engineering |29000|23|4|| Lisa | Sales |10000|35|1|| Alex | Sales |30000|33|2|| Evan | Sales |32000|38|3|+-----------------+-----------------+-------------------+----------------+-----+
1.3. rank函数
rank有等级的含义,函数作用是在分组内按照order by的结果得到有等级编号。
rank() over ()
并列有间隔,结果如:1 2 2 4 5
dense_rank() over()
dense有密集的含义,并列无间隔,结果如:1 2 2 3 4
percent_rank() over()
百分比的rank,含义是(当前行-1)/(当前组总行数-1),当前行从1开始
注意,如果order by的结果相同,则rank得到的结果都相同,在这里的语义是排序结果相同,因此等级编号也相同。
select*,rank()over(partitionby dept orderby salary)as rn from employees;-- 执行结果中,engineering中salary相同的编号相同,paul的值为4+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|2|| Fred | Engineering |21000|28|1|| Tom | Engineering |23000|33|2|| Chloe | Engineering |23000|25|2|| Paul | Engineering |29000|23|4|| Lisa | Sales |10000|35|1|| Alex | Sales |30000|33|2|| Evan | Sales |32000|38|3|+-----------------+-----------------+-------------------+----------------+-----+select*,dense_rank()over(partitionby dept orderby salary)as rn from employees;-- 执行结果中,engineering中salary相同的编号相同,paul的值为3+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|2|| Fred | Engineering |21000|28|1|| Tom | Engineering |23000|33|2|| Chloe | Engineering |23000|25|2|| Paul | Engineering |29000|23|3|| Lisa | Sales |10000|35|1|| Alex | Sales |30000|33|2|| Evan | Sales |32000|38|3|+-----------------+-----------------+-------------------+----------------+-----+select*,percent_rank()over(partitionby dept orderby salary)as rn from employees;-- rn的结果是每行在当前分组中的百分比,注意order by中相同值结果相同+-----------------+-----------------+-------------------+----------------+---------------------+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+---------------------+| Jane | Marketing |29000|28|0.0|| Jeff | Marketing |35000|38|1.0|| Fred | Engineering |21000|28|0.0|| Tom | Engineering |23000|33|0.3333333333333333|| Chloe | Engineering |23000|25|0.3333333333333333|| Paul | Engineering |29000|23|1.0|| Lisa | Sales |10000|35|0.0|| Alex | Sales |30000|33|0.5|| Evan | Sales |32000|38|1.0|+-----------------+-----------------+-------------------+----------------+---------------------+-- order by结果相同的情况,order by deptselect*,rank()over(partitionby dept orderby dept)as rn from employees;select*,dense_rank()over(partitionby dept orderby dept)as rn from employees;-- 二者结果相同+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Fred | Engineering |21000|28|1|| Tom | Engineering |23000|33|1|| Paul | Engineering |29000|23|1|| Chloe | Engineering |23000|25|1|| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|1|| Lisa | Sales |10000|35|1|| Evan | Sales |32000|38|1|| Alex | Sales |30000|33|1|+-----------------+-----------------+-------------------+----------------+-----+select*,percent_rank()over(partitionby dept orderby dept)as rn from employees;+-----------------+-----------------+-------------------+----------------+------+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+------+| Fred | Engineering |21000|28|0.0|| Tom | Engineering |23000|33|0.0|| Paul | Engineering |29000|23|0.0|| Chloe | Engineering |23000|25|0.0|| Jane | Marketing |29000|28|0.0|| Jeff | Marketing |35000|38|0.0|| Lisa | Sales |10000|35|0.0|| Evan | Sales |32000|38|0.0|| Alex | Sales |30000|33|0.0|+-----------------+-----------------+-------------------+----------------+------+
1.4. ntile(n)
ntile(n) over
tile有瓷砖、瓦片的含义,作用是在分组内对数据进行分桶,然后返回桶的序号。
按照order by结果将数据平均的放入到分好的桶中,如果数据无法按照桶个数均分,则将多余的数据放在第一个桶内。
使用场景:例如获取每个部门薪资前三分之一的员工数据,则按照salary降序排序,然后分成3个桶,最后取第一个桶中的数据。
select*,NTILE(2)over(partitionby dept orderby salary)as rn from employees;-- ntile(2)表示分成2桶,+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|2|| Fred | Engineering |21000|28|1|| Tom | Engineering |23000|33|1|| Chloe | Engineering |23000|25|2|| Paul | Engineering |29000|23|2|| Lisa | Sales |10000|35|1|| Alex | Sales |30000|33|1|| Evan | Sales |32000|38|2|+-----------------+-----------------+-------------------+----------------+-----+select*,NTILE(3)over(partitionby dept orderby salary)as rn from employees;-- 分成3桶+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|2|| Fred | Engineering |21000|28|1|| Tom | Engineering |23000|33|1|| Chloe | Engineering |23000|25|2|| Paul | Engineering |29000|23|3|| Lisa | Sales |10000|35|1|| Alex | Sales |30000|33|2|| Evan | Sales |32000|38|3|+-----------------+-----------------+-------------------+----------------+-----+
1.5. cume_dist()
cume表示累计,dist表示距离
用于求累计分布,即分组中按照order by结果的分位数
select*,cume_dist()over(partitionby dept orderby salary)from employees;-- enginnering分组中,tom和chloe中salary相同,二者的分位数结果相同+-----------------+-----------------+-------------------+----------------+---------------------+| employees.name | employees.dept | employees.salary | employees.age | cume_dist_window_0 |+-----------------+-----------------+-------------------+----------------+---------------------+| Jane | Marketing |29000|28|0.5|| Jeff | Marketing |35000|38|1.0|| Fred | Engineering |21000|28|0.25|| Tom | Engineering |23000|33|0.75|| Chloe | Engineering |23000|25|0.75|| Paul | Engineering |29000|23|1.0|| Lisa | Sales |10000|35|0.3333333333333333|| Alex | Sales |30000|33|0.6666666666666666|| Evan | Sales |32000|38|1.0|+-----------------+-----------------+-------------------+----------------+---------------------+select*,cume_dist()over(partitionby dept orderby age)from employees;-- 按照age排序,tom和chloe中age不同,因此分位数不同+-----------------+-----------------+-------------------+----------------+---------------------+| employees.name | employees.dept | employees.salary | employees.age | cume_dist_window_0 |+-----------------+-----------------+-------------------+----------------+---------------------+| Jane | Marketing |29000|28|0.5|| Jeff | Marketing |35000|38|1.0|| Paul | Engineering |29000|23|0.25|| Chloe | Engineering |23000|25|0.5|| Fred | Engineering |21000|28|0.75|| Tom | Engineering |23000|33|1.0|| Alex | Sales |30000|33|0.3333333333333333|| Lisa | Sales |10000|35|0.6666666666666666|| Evan | Sales |32000|38|1.0|+-----------------+-----------------+-------------------+----------------+---------------------+
1.6. lag和lead
lag 表示落后的含义,在使用场景中是小于的含义。
lead 表示领先的含义,在使用场景中是大于的含义。
可以实现不自连接的前提下,按照order by结果得到当前行指定列 前/后移动num行 的列值。
函数参数
LAG/LEAD(col,num,default_value)
:col表示指定的列名;num表示指定的位移行数,默认为1;default_value表示末尾或开头返回的值,默认值null。
SELECT*,
LAG(salary)OVER(PARTITIONBY dept ORDERBY salary)AS lag,
LEAD(salary)OVER(PARTITIONBY dept ORDERBY salary)AS lead
FROM employees;-- 从结果看,按照order by排序结果,lag取的值是当前行前1行的值,lead取的值是当前行后1行的值,对于第1行或者最后1行,取值默认为null+-----------------+-----------------+-------------------+----------------+--------+--------+| employees.name | employees.dept | employees.salary | employees.age | lag | lead |+-----------------+-----------------+-------------------+----------------+--------+--------+| Jane | Marketing |29000|28|NULL|35000|| Jeff | Marketing |35000|38|29000|NULL|| Fred | Engineering |21000|28|NULL|23000|| Tom | Engineering |23000|33|21000|23000|| Chloe | Engineering |23000|25|23000|29000|| Paul | Engineering |29000|23|23000|NULL|| Lisa | Sales |10000|35|NULL|30000|| Alex | Sales |30000|33|10000|32000|| Evan | Sales |32000|38|30000|NULL|+-----------------+-----------------+-------------------+----------------+--------+--------+SELECT*,
LAG(salary,2,0)OVER(PARTITIONBY dept ORDERBY salary)AS lag,
LEAD(salary,1,0)OVER(PARTITIONBY dept ORDERBY salary)AS lead
FROM employees;+-----------------+-----------------+-------------------+----------------+--------+--------+| employees.name | employees.dept | employees.salary | employees.age | lag | lead |+-----------------+-----------------+-------------------+----------------+--------+--------+| Jane | Marketing |29000|28|0|35000|| Jeff | Marketing |35000|38|0|0|| Fred | Engineering |21000|28|0|23000|| Tom | Engineering |23000|33|0|23000|| Chloe | Engineering |23000|25|21000|29000|| Paul | Engineering |29000|23|23000|0|| Lisa | Sales |10000|35|0|30000|| Alex | Sales |30000|33|0|32000|| Evan | Sales |32000|38|10000|0|+-----------------+-----------------+-------------------+----------------+--------+--------+-- 排序结果和取值列不同的情况SELECT*,
LAG(salary)OVER(PARTITIONBY dept ORDERBY age)AS lag,
LEAD(salary)OVER(PARTITIONBY dept ORDERBY age)AS lead
FROM employees;+-----------------+-----------------+-------------------+----------------+--------+--------+| employees.name | employees.dept | employees.salary | employees.age | lag | lead |+-----------------+-----------------+-------------------+----------------+--------+--------+| Jane | Marketing |29000|28|NULL|35000|| Jeff | Marketing |35000|38|29000|NULL|| Paul | Engineering |29000|23|NULL|23000|| Chloe | Engineering |23000|25|29000|21000|| Fred | Engineering |21000|28|23000|23000|| Tom | Engineering |23000|33|21000|NULL|| Alex | Sales |30000|33|NULL|10000|| Lisa | Sales |10000|35|30000|32000|| Evan | Sales |32000|38|10000|NULL|+-----------------+-----------------+-------------------+----------------+--------+--------+
1.7. first_value和last_value
在分组中按照order by的结果,获取指定列的第一个或最后一个值。
注意,默认情况下last_value取的是第一行截止到当前行的最后一个值(当前行的值),并不是整个分区中排序后的最后一个值。
SELECT*,
first_value(salary)OVER(PARTITIONBY dept ORDERBY age)ASfirst,
last_value(salary)OVER(PARTITIONBY dept ORDERBY age)ASlastFROM employees;-- 注意last的结果+-----------------+-----------------+-------------------+----------------+--------+--------+| employees.name | employees.dept | employees.salary | employees.age |first|last|+-----------------+-----------------+-------------------+----------------+--------+--------+| Jane | Marketing |29000|28|29000|29000|| Jeff | Marketing |35000|38|29000|35000|| Paul | Engineering |29000|23|29000|29000|| Chloe | Engineering |23000|25|29000|23000|| Fred | Engineering |21000|28|29000|21000|| Tom | Engineering |23000|33|29000|23000|| Alex | Sales |30000|33|30000|30000|| Lisa | Sales |10000|35|30000|10000|| Evan | Sales |32000|38|30000|32000|+-----------------+-----------------+-------------------+----------------+--------+--------+SELECT*,
first_value(salary)OVER(PARTITIONBY dept ORDERBY salary)ASfirst,
last_value(salary)OVER(PARTITIONBY dept ORDERBY salary)ASlastFROM employees;+-----------------+-----------------+-------------------+----------------+--------+--------+| employees.name | employees.dept | employees.salary | employees.age |first|last|+-----------------+-----------------+-------------------+----------------+--------+--------+| Jane | Marketing |29000|28|29000|29000|| Jeff | Marketing |35000|38|29000|35000|| Fred | Engineering |21000|28|21000|21000|| Tom | Engineering |23000|33|21000|23000|| Chloe | Engineering |23000|25|21000|23000|| Paul | Engineering |29000|23|21000|29000|| Lisa | Sales |10000|35|10000|10000|| Alex | Sales |30000|33|10000|30000|| Evan | Sales |32000|38|10000|32000|+-----------------+-----------------+-------------------+----------------+--------+--------+
1.8. nth_value
nth表示第几个的含义
作用,在分组中返回order by结果中指定列的第N行值。
注意,hive中无此函数
SELECT*,nth_value(salary,2)OVER(PARTITIONBY dept ORDERBY salary)AS nth FROM employees;
2. OVER从句
- 在hive中
over
语句支持仅有partition by
语句,没有order by
语句。当没有order by
时,order by
默认使用partition by
指定的字段序列, - 在spark中如果未指定
order by
,将会报错Error in query: Window function row_number() requires window to be ordered, please add ORDER BY clause
select*, row_number()over(partitionby name)as rn from employees;-- 执行结果如下,从结果中得知按照name的字典序排序+-----------------+-----------------+-------------------+----------------+-----+| employees.name | employees.dept | employees.salary | employees.age | rn |+-----------------+-----------------+-------------------+----------------+-----+| Alex | Sales |30000|33|1|| Evan | Sales |32000|38|1|| Fred | Engineering |21000|28|1|| Lisa | Sales |10000|35|1|| Tom | Engineering |23000|33|1|| Chloe | Engineering |23000|25|1|| Jane | Marketing |29000|28|1|| Jeff | Marketing |35000|38|1|| Paul | Engineering |29000|23|1|+-----------------+-----------------+-------------------+----------------+-----+
2.1. window specification
在OVER语句中可以带有一个window specification,支持以下格式:
格式:
(ROWS| RANGE)BETWEEN xxx AND xxx
具体支持以下3种格式
(ROWS| RANGE)BETWEEN(UNBOUNDED|[num])PRECEDINGAND([num]PRECEDING|CURRENTROW|(UNBOUNDED|[num])FOLLOWING)(ROWS| RANGE)BETWEENCURRENTROWAND(CURRENTROW|(UNBOUNDED|[num])FOLLOWING)(ROWS| RANGE)BETWEEN[num]FOLLOWINGAND(UNBOUNDED|[num])FOLLOWING
ROWS
表示在同分区中,按照order by结果按行进行逐行处理RANGE
表示在同分区中按照order by结果的范围处理,如1,2,3,3,4,4,5排序中,3,3和4,4的排序结果相同,则3,3和4,4将分别当成整体处理,每行对应开窗函数的结果相同。UNBOUNDED PRECEDING
表示未绑定当前行之前的行。整个分区中,从第一行开始处理num PRECEDING
表示限制当前行的前num行。如在sum时,每行sum的结果是从前num行到当前行的累加值CURRENT ROW
表示处理过程中的当前行(处理数据过程中的游标指针)UNBOUNDED FOLLOWING
表示未绑定当前行之后的行。整个分区,处理到最后一行num FOLLOWING
表示限制当前行的后num行。如在sum时,每行值的sum的结果将处理到后num行。
注意:在hive中函数不支持使用window specification
rank,
、
dense_rank
、
percent_rank()
cume_dist()
、
ntile
lead
、
lag
2.1.1. 显式order by下的默认值
**当指定了order by语句而未指定window specification语句时,默认的window specification语句是
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
**。
即按照order by结果的range处理,并且从分区中的第一行处理到当前行。
select*,sum(salary)over(partitionby dept orderby salary)as sum_salary from employees;select*,sum(salary)over(partitionby dept orderby salary range betweenunboundedprecedingandcurrentrow)as sum_salary from employees;-- 二者的执行结果相同+-----------------+-----------------+-------------------+----------------+-------------+| employees.name | employees.dept | employees.salary | employees.age | sum_salary |+-----------------+-----------------+-------------------+----------------+-------------+| Jane | Marketing |29000|28|29000|| Jeff | Marketing |35000|38|64000|| Fred | Engineering |21000|28|21000|| Tom | Engineering |23000|33|67000|| Chloe | Engineering |23000|25|67000|| Paul | Engineering |29000|23|96000|| Lisa | Sales |10000|35|10000|| Alex | Sales |30000|33|40000|| Evan | Sales |32000|38|72000|+-----------------+-----------------+-------------------+----------------+-------------+
注意:Engineering部门的tom和chloe员工的sum_salary值相同。这是因为在Engineering分区下,二者order by salary结果相同。在range策略下,tom和chloe将当成整体处理,即46000,因此tom和chloe的累计值都为67000(21000+460000),而不是44000和67000。
在SQL中显示指定如下语句,实现分区中的按行累加效果,即
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Engineering部门的tom和chloe员工的sum_salary结果将分别为44000和67000。
select*,sum(salary)over(partitionby dept orderby salary rowsbetweenunboundedprecedingandcurrentrow)as sum_salary from employees;-- 结果如下+-----------------+-----------------+-------------------+----------------+-------------+| employees.name | employees.dept | employees.salary | employees.age | sum_salary |+-----------------+-----------------+-------------------+----------------+-------------+| Jane | Marketing |29000|28|29000|| Jeff | Marketing |35000|38|64000|| Fred | Engineering |21000|28|21000|| Tom | Engineering |23000|33|44000|| Chloe | Engineering |23000|25|67000|| Paul | Engineering |29000|23|96000|| Lisa | Sales |10000|35|10000|| Alex | Sales |30000|33|40000|| Evan | Sales |32000|38|72000|+-----------------+-----------------+-------------------+----------------+-------------+
在SQL中显示指定如下语句,实现相同分组内进行全部值求和效果。
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
select*,sum(salary)over(partitionby dept orderby salary range betweenunboundedprecedingandunboundedfollowing)as sum_salary from employees;-- 结果如下+-----------------+-----------------+-------------------+----------------+-------------+| employees.name | employees.dept | employees.salary | employees.age | sum_salary |+-----------------+-----------------+-------------------+----------------+-------------+| Jane | Marketing |29000|28|64000|| Jeff | Marketing |35000|38|64000|| Fred | Engineering |21000|28|96000|| Tom | Engineering |23000|33|96000|| Chloe | Engineering |23000|25|96000|| Paul | Engineering |29000|23|96000|| Lisa | Sales |10000|35|72000|| Alex | Sales |30000|33|72000|| Evan | Sales |32000|38|72000|+-----------------+-----------------+-------------------+----------------+-------------+
结论:
- 情形一:在partition by和order by同时存在的情况下,对于MAX | MIN | COUNT | SUM | AVG 等函数 如果想要对相同分组中的数整体进行计算,则要显示指定
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
; - 情形二:如果按在分组中实现按行处理,则要显示指定
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
; - 情形三:默认情况下(即未显示指定window specification时)对于order by列相同的值处理结果相同。
2.1.2. 无order by下的默认值
**当over语句中order by和window specification都缺失时,window specification的默认值是
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
**
结论
- 对MAX | MIN | COUNT | SUM | AVG 等函数,在order by缺失或order by和partition by相同时,效果同上述情形一
select*,sum(salary)over(partitionby dept)as sum_salary from employees;-- 结果如下+-----------------+-----------------+-------------------+----------------+-------------+| employees.name | employees.dept | employees.salary | employees.age | sum_salary |+-----------------+-----------------+-------------------+----------------+-------------+| Fred | Engineering |21000|28|96000|| Tom | Engineering |23000|33|96000|| Paul | Engineering |29000|23|96000|| Chloe | Engineering |23000|25|96000|| Jane | Marketing |29000|28|64000|| Jeff | Marketing |35000|38|64000|| Lisa | Sales |10000|35|72000|| Evan | Sales |32000|38|72000|| Alex | Sales |30000|33|72000|+-----------------+-----------------+-------------------+----------------+-------------+
结果中,对于相同dept中不同员工,效果并不是按行并逐行处理,而是对相同dept下的员工进行了统一处理。即UNBOUNDED FOLLOWING表示不跟随当前处理的行,直接对中整个分区中进行计算。
疑问:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
中rows的作用是什么了?
3. last_value函数的注意事项
前面在提到last_value时,特意强调了该函数的结果并不是分区中的最后一个值,结合上述介绍的window specification再来看下该函数的结果值。
SELECT*,
first_value(salary)OVER(PARTITIONBY dept ORDERBY salary)ASfirst,
last_value(salary)OVER(PARTITIONBY dept ORDERBY salary)ASlastFROM employees;+-----------------+-----------------+-------------------+----------------+--------+--------+| employees.name | employees.dept | employees.salary | employees.age |first|last|+-----------------+-----------------+-------------------+----------------+--------+--------+| Jane | Marketing |29000|28|29000|29000|| Jeff | Marketing |35000|38|29000|35000|| Fred | Engineering |21000|28|21000|21000|| Tom | Engineering |23000|33|21000|23000|| Chloe | Engineering |23000|25|21000|23000|| Paul | Engineering |29000|23|21000|29000|| Lisa | Sales |10000|35|10000|10000|| Alex | Sales |30000|33|10000|30000|| Evan | Sales |32000|38|10000|32000|+-----------------+-----------------+-------------------+----------------+--------+--------+
结果中,对于每一行的last_value的结果都是当前值,并不分区中按salary升序的最后一个值。造成这个结果的原因正是由于默认的window specification导致的(
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)。
如想要实现得到整个分区中按salary升序的最大值,则需要显示设置window specification为
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
select*,first_value(salary)over(partitionby dept orderby salary)as first_salary
,last_value(salary)over(partitionby dept orderby salary range betweenunboundedprecedingandunboundedfollowing)as last_salary from employees;-- 结果如下+-----------------+-----------------+-------------------+----------------+---------------+--------------+| employees.name | employees.dept | employees.salary | employees.age | first_salary | last_salary |+-----------------+-----------------+-------------------+----------------+---------------+--------------+| Jane | Marketing |29000|28|29000|35000|| Jeff | Marketing |35000|38|29000|35000|| Fred | Engineering |21000|28|21000|29000|| Tom | Engineering |23000|33|21000|29000|| Chloe | Engineering |23000|25|21000|29000|| Paul | Engineering |29000|23|21000|29000|| Lisa | Sales |10000|35|10000|32000|| Alex | Sales |30000|33|10000|32000|| Evan | Sales |32000|38|10000|32000|+-----------------+-----------------+-------------------+----------------+---------------+--------------+
版权归原作者 有数的编程笔记 所有, 如有侵权,请联系我们删除。