0


over 与lateral view 的hive、spark sql执行计划

建表语句

create table test_over 
(
    user_id string,
    login_date string
)  COMMENT '测试函数使用,可以删除'
row format delimited
fields terminated by '\t';

over 执行计划

spark

spark-sql> explain select
         >   user_id
         >   ,login_date
         >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
         > from test_over;
22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
== Physical Plan ==
Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
+- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#33, 200)
      +- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
Time taken: 0.098 seconds, Fetched 1 row(s)
22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)
spark-sql> 
         > explain 
         > select
         >   user_id
         >   ,login_date
         >   ,first_value(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
+- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#38, 200)
      +- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
Time taken: 0.077 seconds, Fetched 1 row(s)
22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)
spark-sql> 
         > 
         > explain select
         >   user_id
         >   ,login_date
         >   ,max(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [max(login_date#45) windowspecdefinition(user_id#44, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#41], [user_id#44]
+- *(1) Sort [user_id#44 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#44, 200)
      +- Scan hive default.test_over [user_id#44, login_date#45], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#44, login_date#45]
Time taken: 0.081 seconds, Fetched 1 row(s)
22/03/10 10:58:15 INFO [main] SparkSQLCLIDriver: Time taken: 0.081 seconds, Fetched 1 row(s)

hive

hive> explain select
    >   user_id
    >   ,login_date
    >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
    > from test_over;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_over
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Reduce Output Operator
              key expressions: user_id (type: string), login_date (type: string)
              sort order: ++
              Map-reduce partition columns: user_id (type: string)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Execution mode: vectorized
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col1 ASC NULLS FIRST
                  partition by: _col0
                  raw input shape:
                  window functions:
                      window function definition
                        alias: lag_window_0
                        arguments: _col1, 1, '0001-01-01'
                        name: lag
                        window function: GenericUDAFLagEvaluator
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
                        isPivotResult: true
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), lag_window_0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
Time taken: 0.211 seconds, Fetched: 61 row(s)
hive> explain select
    >   user_id
    >   ,login_date
    >   ,max(login_date) over(partition by user_id ) prev_date
    > from test_over;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_over
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Reduce Output Operator
              key expressions: user_id (type: string)
              sort order: +
              Map-reduce partition columns: user_id (type: string)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              value expressions: login_date (type: string)
      Execution mode: vectorized
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col0 ASC NULLS FIRST
                  partition by: _col0
                  raw input shape:
                  window functions:
                      window function definition
                        alias: max_window_0
                        arguments: _col1
                        name: max
                        window function: GenericUDAFMaxEvaluator
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), max_window_0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 3.278 seconds, Fetched: 61 row(s)

lateral view 执行计划

spark

spark-sql> 
         > explain select
         >   user_id
         >   ,login_date
         >   ,single_num
         > from test_over
         > lateral view explode(split(login_date,'-')) tmp as single_num;
== Physical Plan ==
Generate explode(split(login_date#58, -)), [user_id#57, login_date#58], false, [single_num#59]
+- Scan hive default.test_over [user_id#57, login_date#58], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#57, login_date#58]
Time taken: 0.103 seconds, Fetched 1 row(s)
22/03/10 14:39:38 INFO [main] SparkSQLCLIDriver: Time taken: 0.103 seconds, Fetched 1 row(s)

hive

hive> 
    > explain select
    >   user_id
    >   ,login_date
    >   ,single_num
    > from test_over
    > lateral view explode(split(login_date,'-')) tmp as single_num;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: test_over
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Lateral View Forward
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: user_id (type: string), login_date (type: string)
              outputColumnNames: user_id, login_date
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Lateral View Join Operator
                outputColumnNames: _col0, _col1, _col5
                Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  ListSink
            Select Operator
              expressions: split(login_date, '-') (type: array<string>)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              UDTF Operator
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                function name: explode
                Lateral View Join Operator
                  outputColumnNames: _col0, _col1, _col5
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    ListSink

Time taken: 0.081 seconds, Fetched: 41 row(s)
标签: hive spark sql

本文转载自: https://blog.csdn.net/houminZhang/article/details/123400951
版权归原作者 houminZhang 所有, 如有侵权,请联系我们删除。

“over 与lateral view 的hive、spark sql执行计划”的评论:

还没有评论