0


over 与lateral view 的hive、spark sql执行计划

建表语句

  1. create table test_over
  2. (
  3. user_id string,
  4. login_date string
  5. ) COMMENT '测试函数使用,可以删除'
  6. row format delimited
  7. fields terminated by '\t';

over 执行计划

spark

  1. spark-sql> explain select
  2. > user_id
  3. > ,login_date
  4. > ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
  5. > from test_over;
  6. 22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
  7. == Physical Plan ==
  8. Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
  9. +- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0
  10. +- Exchange hashpartitioning(user_id#33, 200)
  11. +- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
  12. Time taken: 0.098 seconds, Fetched 1 row(s)
  13. 22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)
  1. spark-sql>
  2. > explain
  3. > select
  4. > user_id
  5. > ,login_date
  6. > ,first_value(login_date) over(partition by user_id ) prev_date
  7. > from test_over;
  8. == Physical Plan ==
  9. Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
  10. +- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0
  11. +- Exchange hashpartitioning(user_id#38, 200)
  12. +- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
  13. Time taken: 0.077 seconds, Fetched 1 row(s)
  14. 22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)
  1. spark-sql>
  2. >
  3. > explain select
  4. > user_id
  5. > ,login_date
  6. > ,max(login_date) over(partition by user_id ) prev_date
  7. > from test_over;
  8. == Physical Plan ==
  9. Window [max(login_date#45) windowspecdefinition(user_id#44, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#41], [user_id#44]
  10. +- *(1) Sort [user_id#44 ASC NULLS FIRST], false, 0
  11. +- Exchange hashpartitioning(user_id#44, 200)
  12. +- Scan hive default.test_over [user_id#44, login_date#45], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#44, login_date#45]
  13. Time taken: 0.081 seconds, Fetched 1 row(s)
  14. 22/03/10 10:58:15 INFO [main] SparkSQLCLIDriver: Time taken: 0.081 seconds, Fetched 1 row(s)

hive

  1. hive> explain select
  2. > user_id
  3. > ,login_date
  4. > ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
  5. > from test_over;
  6. OK
  7. STAGE DEPENDENCIES:
  8. Stage-1 is a root stage
  9. Stage-0 depends on stages: Stage-1
  10. STAGE PLANS:
  11. Stage: Stage-1
  12. Map Reduce
  13. Map Operator Tree:
  14. TableScan
  15. alias: test_over
  16. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  17. Reduce Output Operator
  18. key expressions: user_id (type: string), login_date (type: string)
  19. sort order: ++
  20. Map-reduce partition columns: user_id (type: string)
  21. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  22. Execution mode: vectorized
  23. Reduce Operator Tree:
  24. Select Operator
  25. expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string)
  26. outputColumnNames: _col0, _col1
  27. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  28. PTF Operator
  29. Function definitions:
  30. Input definition
  31. input alias: ptf_0
  32. output shape: _col0: string, _col1: string
  33. type: WINDOWING
  34. Windowing table definition
  35. input alias: ptf_1
  36. name: windowingtablefunction
  37. order by: _col1 ASC NULLS FIRST
  38. partition by: _col0
  39. raw input shape:
  40. window functions:
  41. window function definition
  42. alias: lag_window_0
  43. arguments: _col1, 1, '0001-01-01'
  44. name: lag
  45. window function: GenericUDAFLagEvaluator
  46. window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
  47. isPivotResult: true
  48. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  49. Select Operator
  50. expressions: _col0 (type: string), _col1 (type: string), lag_window_0 (type: string)
  51. outputColumnNames: _col0, _col1, _col2
  52. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  53. File Output Operator
  54. compressed: false
  55. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  56. table:
  57. input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  58. output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  59. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  60. Stage: Stage-0
  61. Fetch Operator
  62. limit: -1
  63. Processor Tree:
  64. ListSink
  1. Time taken: 0.211 seconds, Fetched: 61 row(s)
  2. hive> explain select
  3. > user_id
  4. > ,login_date
  5. > ,max(login_date) over(partition by user_id ) prev_date
  6. > from test_over;
  7. OK
  8. STAGE DEPENDENCIES:
  9. Stage-1 is a root stage
  10. Stage-0 depends on stages: Stage-1
  11. STAGE PLANS:
  12. Stage: Stage-1
  13. Map Reduce
  14. Map Operator Tree:
  15. TableScan
  16. alias: test_over
  17. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  18. Reduce Output Operator
  19. key expressions: user_id (type: string)
  20. sort order: +
  21. Map-reduce partition columns: user_id (type: string)
  22. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  23. value expressions: login_date (type: string)
  24. Execution mode: vectorized
  25. Reduce Operator Tree:
  26. Select Operator
  27. expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
  28. outputColumnNames: _col0, _col1
  29. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  30. PTF Operator
  31. Function definitions:
  32. Input definition
  33. input alias: ptf_0
  34. output shape: _col0: string, _col1: string
  35. type: WINDOWING
  36. Windowing table definition
  37. input alias: ptf_1
  38. name: windowingtablefunction
  39. order by: _col0 ASC NULLS FIRST
  40. partition by: _col0
  41. raw input shape:
  42. window functions:
  43. window function definition
  44. alias: max_window_0
  45. arguments: _col1
  46. name: max
  47. window function: GenericUDAFMaxEvaluator
  48. window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
  49. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  50. Select Operator
  51. expressions: _col0 (type: string), _col1 (type: string), max_window_0 (type: string)
  52. outputColumnNames: _col0, _col1, _col2
  53. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  54. File Output Operator
  55. compressed: false
  56. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  57. table:
  58. input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  59. output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  60. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  61. Stage: Stage-0
  62. Fetch Operator
  63. limit: -1
  64. Processor Tree:
  65. ListSink
  66. Time taken: 3.278 seconds, Fetched: 61 row(s)

lateral view 执行计划

spark

  1. spark-sql>
  2. > explain select
  3. > user_id
  4. > ,login_date
  5. > ,single_num
  6. > from test_over
  7. > lateral view explode(split(login_date,'-')) tmp as single_num;
  8. == Physical Plan ==
  9. Generate explode(split(login_date#58, -)), [user_id#57, login_date#58], false, [single_num#59]
  10. +- Scan hive default.test_over [user_id#57, login_date#58], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#57, login_date#58]
  11. Time taken: 0.103 seconds, Fetched 1 row(s)
  12. 22/03/10 14:39:38 INFO [main] SparkSQLCLIDriver: Time taken: 0.103 seconds, Fetched 1 row(s)

hive

  1. hive>
  2. > explain select
  3. > user_id
  4. > ,login_date
  5. > ,single_num
  6. > from test_over
  7. > lateral view explode(split(login_date,'-')) tmp as single_num;
  8. OK
  9. STAGE DEPENDENCIES:
  10. Stage-0 is a root stage
  11. STAGE PLANS:
  12. Stage: Stage-0
  13. Fetch Operator
  14. limit: -1
  15. Processor Tree:
  16. TableScan
  17. alias: test_over
  18. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  19. Lateral View Forward
  20. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  21. Select Operator
  22. expressions: user_id (type: string), login_date (type: string)
  23. outputColumnNames: user_id, login_date
  24. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  25. Lateral View Join Operator
  26. outputColumnNames: _col0, _col1, _col5
  27. Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  28. Select Operator
  29. expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
  30. outputColumnNames: _col0, _col1, _col2
  31. Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  32. ListSink
  33. Select Operator
  34. expressions: split(login_date, '-') (type: array<string>)
  35. outputColumnNames: _col0
  36. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  37. UDTF Operator
  38. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  39. function name: explode
  40. Lateral View Join Operator
  41. outputColumnNames: _col0, _col1, _col5
  42. Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  43. Select Operator
  44. expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
  45. outputColumnNames: _col0, _col1, _col2
  46. Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  47. ListSink
  48. Time taken: 0.081 seconds, Fetched: 41 row(s)
标签: hive spark sql

本文转载自: https://blog.csdn.net/houminZhang/article/details/123400951
版权归原作者 houminZhang 所有, 如有侵权,请联系我们删除。

“over 与lateral view 的hive、spark sql执行计划”的评论:

还没有评论