目录
总纲
(保姆级)Spark气象监测数据分析-总纲
写在前面的话
首先这篇博客绝对原创。读者遇到编程中的任何问题可以留言,看到了就会回复
需要的前瞻知识
这篇博客是假设读者都是已经安装好了Hadoop,Spark,以及对idea插件等,如果在安装这些大数据软件遇到了困难可以根据版本号在CSDN里搜对应的大数据软件安装
用到的软件版本
Hadoop2.7.7;Java1.8.0;sbt1.4.0;Spark2.4.0;Hive2.1.1;ZooKeeper3.5.10;Python3.7.9
数据集
数据集
也可点击下面的链接
链接:https://pan.baidu.com/s/13T8IHjAjvbsvQtQ01Ro__Q?pwd=494j
提取码:494j
代码原理
六大污染物为SO2、NO2、PM10、PM2.5、O3以及CO,该部分数据分析主要包括:
(1)读入res.csv,创建临时视图,从临时视图中选取字段:
监测时间
,
SO2监测浓度(μg/m3)
;
(2)将选取的数据按照浓度降序排列,并选取前20的数据;
(3)将上述前20数据再按年度时间进行升序排序,存入文件。
使用spark囊括的sql语句选取需要的信息然后导出到/work/Task1路径下
部分代码
运行spark
[root@master ~]# ./spark-2.4.0-bin-hadoop2.7/sbin/start-all.sh
对于代码我分成了几个部分
导包
对于包依赖的安装我会过几天更新,具体是步骤0.1,如果有人看到了这里但是我忘了更新可以提醒我!
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.SparkConf
import org.apache.log4j.{Level,Logger}
import org.apache.spark.mllib.stat.Statistics
import scala.collection.mutable.ArrayBuffer
一些Spark信息的和schema的导入
val schema = StructType(Array(
StructField("", FloatType),
StructField("监测时间", StringType),
StructField("SO2监测浓度(μg/m³)", FloatType),
StructField("NO2监测浓度(μg/m³)", FloatType),
StructField("PM10监测浓度(μg/m³)", FloatType),
StructField("PM2.5监测浓度(μg/m³)", FloatType),
StructField("O3监测浓度(μg/m³)", FloatType),
StructField("CO监测浓度(mg/m³)", FloatType),
StructField("温度(℃)", FloatType),
StructField("湿度(%)", FloatType),
StructField("气压(MBar)", FloatType),
StructField("风速(m/s)", FloatType),
StructField("风向(°)", FloatType),
StructField("云量", FloatType),
StructField("长波辐射(W/m²)", FloatType)
))
val schema_data2 = StructType(Array(
StructField("监测日期", StringType),
StructField("SO2监测浓度(μg/m3)", FloatType),
StructField("NO2监测浓度(μg/m3)", FloatType),
StructField("PM10监测浓度(μg/m3)", FloatType),
StructField("PM2.5监测浓度(μg/m3)", FloatType),
StructField("O3最大八小时滑动平均监测浓度(μg/m3)", FloatType),
StructField("CO监测浓度(mg/m3)", FloatType)
))
val spark = SparkSession
.builder()
.master("spark://192.168.244.130:7077")
.getOrCreate()
如果spark链接报错
如果链接spark的时候会失败,可以使用下面的代码替换之前的(但是这个只能是测试的时候用,是假的Spark,具体报错的debug过几天我也会做,忘了有读者需要可以提醒我)
val spark = SparkSession
.builder()
.master("local[2]")
.getOrCreate()
主函数代码
def main(args: Array[String]): Unit = {
// Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
// Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
Logger.getLogger("org").setLevel(Level.ERROR)
println("Test Begin")
// println(SparkSession.getClass)
val df = spark.read
.schema(schema)
.option("header", "true")
.csv("file:///root/res.csv")
// df.show()
Task1(df)
// Task2(df)
val df_data2 = spark.read
.schema(schema_data2)
.option("header", "true")
.csv("file:///root/data2.csv")
// df_data2.show()
// Task3(df_data2)
// Task4(df)
}
Task1函数的代码(主要部分)
def Task1(df: DataFrame): Unit = {
df.createOrReplaceTempView("SO2")
val SO2_1 = spark.sql("select `监测时间`,`SO2监测浓度(μg/m³)`from SO2 where `SO2监测浓度(μg/m³)` > 0 " + "order by `SO2监测浓度(μg/m³)` desc limit 20 ")
val SO2_2 = SO2_1.orderBy("监测时间")
SO2_2.write.option("header", true).mode("overwrite").csv("file:///root/work/Task1/SO2_20.csv")
// SO2_1.show()
println("筛选污染物SO2浓度排名前20的时段:")
SO2_2.show()
df.createOrReplaceTempView("NO2")
val NO2_1 = spark.sql("select `监测时间`,`NO2监测浓度(μg/m³)`from NO2 where `NO2监测浓度(μg/m³)` > 0 " + "order by `NO2监测浓度(μg/m³)` desc limit 20 ")
val NO2_2 = NO2_1.orderBy("监测时间")
NO2_2.write.option("header", true).mode("overwrite").csv("file:///root/work/Task1/NO2_20.csv")
// SO2_1.show()
println("筛选污染物NO2浓度排名前20的时段:")
NO2_2.show()
df.createOrReplaceTempView("PM10")
val PM10_1 = spark.sql("select `监测时间`,`PM10监测浓度(μg/m³)`from PM10 where `PM10监测浓度(μg/m³)` > 0 " + "order by `PM10监测浓度(μg/m³)` desc limit 20 ")
val PM10_2 = PM10_1.orderBy("监测时间")
PM10_2.write.option("header", true).mode("overwrite").csv("file:///root/work/Task1/PM10_20.csv")
// SO2_1.show()
println("筛选污染物PM10浓度排名前20的时段:")
PM10_2.show()
df.createOrReplaceTempView("PM25")
val PM25_1 = spark.sql("select `监测时间`,`PM2.5监测浓度(μg/m³)`from PM25 where `PM2.5监测浓度(μg/m³)` > 0 " + "order by `PM2.5监测浓度(μg/m³)` desc limit 20 ")
val PM25_2 = PM25_1.orderBy("监测时间")
PM25_2.write.option("header", true).mode("overwrite").csv("file:///root/work/Task1/PM25_20.csv")
// SO2_1.show()
println("筛选污染物PM2.5浓度排名前20的时段:")
PM25_2.show()
df.createOrReplaceTempView("O3")
val O3_1 = spark.sql("select `监测时间`,`O3监测浓度(μg/m³)`from O3 where `O3监测浓度(μg/m³)` > 0 " + "order by `O3监测浓度(μg/m³)` desc limit 20 ")
val O3_2 = O3_1.orderBy("监测时间")
O3_2.write.option("header", true).mode("overwrite").csv("file:///root/work/Task1/O3_20.csv")
// SO2_1.show()
println("筛选六大污染物浓度排名前20的时段:")
O3_2.show()
// df.createOrReplaceTempView("CO")
// val CO_1 = spark.sql("select `监测时间`, `CO监测浓度(mg/m³)` from CO where `CO监测浓度(μg/m³)` > 0 " + "order by `CO监测浓度(μg/m³)` desc limit 20 ")
// val CO_2 = CO_1.orderBy("监测时间")
// CO_2.write.option("header", true).mode("overwrite").csv("file:///root/work/Task1/CO_20.csv")
// // SO2_1.show()
// println("筛选污染物CO浓度排名前20的时段:")
// CO_2.show()
}
运行结果
运行过程很缓慢,因为启动时间就占据了很多很多,处理的还是比较快的。
首先是一些info然后才是show
/root/jdk1.8.0_191/bin/java -javaagent:/root/idea-IC-221.6008.13/lib/idea_rt.jar=42516:/root/idea-IC-221.6008.13/bin -Dfile.encoding=UTF-8 -classpath /root/jdk1.8.0_191/jre/lib/charsets.jar:/root/jdk1.8.0_191/jre/lib/deploy.jar:/root/jdk1.8.0_191/jre/lib/ext/cldrdata.jar:/root/jdk1.8.0_191/jre/lib/ext/dnsns.jar:/root/jdk1.8.0_191/jre/lib/ext/jaccess.jar:/root/jdk1.8.0_191/jre/lib/ext/jfxrt.jar:/root/jdk1.8.0_191/jre/lib/ext/localedata.jar:/root/jdk1.8.0_191/jre/lib/ext/mysql-connector-java-5.1.27.jar:/root/jdk1.8.0_191/jre/lib/ext/nashorn.jar:/root/jdk1.8.0_191/jre/lib/ext/sunec.jar:/root/jdk1.8.0_191/jre/lib/ext/sunjce_provider.jar:/root/jdk1.8.0_191/jre/lib/ext/sunpkcs11.jar:/root/jdk1.8.0_191/jre/lib/ext/zipfs.jar:/root/jdk1.8.0_191/jre/lib/javaws.jar:/root/jdk1.8.0_191/jre/lib/jce.jar:/root/jdk1.8.0_191/jre/lib/jfr.jar:/root/jdk1.8.0_191/jre/lib/jfxswt.jar:/root/jdk1.8.0_191/jre/lib/jsse.jar:/root/jdk1.8.0_191/jre/lib/management-agent.jar:/root/jdk1.8.0_191/jre/lib/plugin.jar:/root/jdk1.8.0_191/jre/lib/resources.jar:/root/jdk1.8.0_191/jre/lib/rt.jar:/root/IdeaProjects/SBTTest/target/scala-2.12/classes:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/aopalliance/aopalliance/1.0/aopalliance-1.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/carrotsearch/hppc/0.7.2/hppc-0.7.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/chuusai/shapeless_2.12/2.3.2/shapeless_2.12-2.3.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/fasterxml/jackson/core/jackson-core/2.7.9/jackson-core-2.7.9.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/fasterxml/jackson/core/jackson-databind/2.6.7.1/jackson-databind-2.6.7.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paranamer-2.7.9.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.6.7.1/jackson-module-scala_2.12-2.6.7.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/github/luben/zstd-jni/1.3.2-2/zstd-jni-1.3.2-2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/google/inject/guice/3.0/guice-3.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/twitter/chill_2.12/0.9.3/chill_2.12-0.9.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/univocity/univocity-parsers/2.7.3/univocity-parsers-2.7.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/com/vlkan/flatbuffers/1.2.0-3f79e055/flatbuffers-1.2.0-3f79e055.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-codec/commons-codec/1.10/commons-codec-1.10.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-io/commons-io/2.4/commons-io-2.4.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/commons-net/commons-net/3.1/commons-net-3.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/dropwizard/metrics/metrics-core/3.1.5/metrics-core-3.1.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/dropwizard/metrics/metrics-graphite/3.1.5/metrics-graphite-3.1.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/dropwizard/metrics/metrics-jvm/3.1.5/metrics-jvm-3.1.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/netty/netty-all/4.1.17.Final/netty-all-4.1.17.Final.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/io/netty/netty/3.9.9.Final/netty-3.9.9.Final.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/activation/activation/1.1.1/activation-1.1.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/inject/javax.inject/1/javax.inject-1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/jline/jline/0.9.94/jline-0.9.94.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/joda-time/joda-time/2.9.9/joda-time-2.9.9.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/log4j/log4j/1.2.17/log4j-1.2.17.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/net/sf/py4j/py4j/0.10.7/py4j-0.10.7.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/arrow/arrow-format/0.10.0/arrow-format-0.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/arrow/arrow-memory/0.10.0/arrow-memory-0.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-annotations/2.6.5/hadoop-annotations-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-client/2.6.5/hadoop-client-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-common/2.6.5/hadoop-common-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-hdfs/2.6.5/hadoop-hdfs-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.5/hadoop-mapreduce-client-app-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.5/hadoop-mapreduce-client-common-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.5/hadoop-mapreduce-client-core-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.5/hadoop-mapreduce-client-jobclient-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.5/hadoop-mapreduce-client-shuffle-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-yarn-client/2.6.5/hadoop-yarn-client-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-yarn-common/2.6.5/hadoop-yarn-common-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-yarn-server-common/2.6.5/hadoop-yarn-server-common-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/hadoop/hadoop-yarn-server-nodemanager/2.6.5/hadoop-yarn-server-nodemanager-2.6.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/orc/orc-core/1.5.2/orc-core-1.5.2-nohive.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/orc/orc-mapreduce/1.5.2/orc-mapreduce-1.5.2-nohive.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/orc/orc-shims/1.5.2/orc-shims-1.5.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/parquet/parquet-column/1.10.0/parquet-column-1.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/parquet/parquet-common/1.10.0/parquet-common-1.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/parquet/parquet-encoding/1.10.0/parquet-encoding-1.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/parquet/parquet-hadoop/1.10.0/parquet-hadoop-1.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/parquet/parquet-jackson/1.10.0/parquet-jackson-1.10.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-catalyst_2.12/2.4.0/spark-catalyst_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-core_2.12/2.4.0/spark-core_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-graphx_2.12/2.4.0/spark-graphx_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-kvstore_2.12/2.4.0/spark-kvstore_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-launcher_2.12/2.4.0/spark-launcher_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-mllib-local_2.12/2.4.0/spark-mllib-local_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-mllib_2.12/2.4.0/spark-mllib_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-network-common_2.12/2.4.0/spark-network-common_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-network-shuffle_2.12/2.4.0/spark-network-shuffle_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-sketch_2.12/2.4.0/spark-sketch_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-sql_2.12/2.4.0/spark-sql_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-streaming_2.12/2.4.0/spark-streaming_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-tags_2.12/2.4.0/spark-tags_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/spark/spark-unsafe_2.12/2.4.0/spark-unsafe_2.12-2.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/janino/commons-compiler/3.0.9/commons-compiler-3.0.9.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/janino/janino/3.0.9/janino-3.0.9.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/json4s/json4s-ast_2.12/3.5.3/json4s-ast_2.12-3.5.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/json4s/json4s-core_2.12/3.5.3/json4s-core_2.12-3.5.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/json4s/json4s-jackson_2.12/3.5.3/json4s-jackson_2.12-3.5.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/json4s/json4s-scalap_2.12/3.5.3/json4s-scalap_2.12-3.5.3.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/roaringbitmap/RoaringBitmap/0.5.11/RoaringBitmap-0.5.11.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/scala-lang/modules/scala-parser-combinators_2.12/1.1.0/scala-parser-combinators_2.12-1.1.0.jar:/root/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/modules/scala-xml_2.12/1.0.6/scala-xml_2.12-1.0.6.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/scala-lang/scala-library/2.12.2/scala-library-2.12.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/scala-lang/scala-reflect/2.12.2/scala-reflect-2.12.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/scalanlp/breeze-macros_2.12/0.13.2/breeze-macros_2.12-0.13.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/scalanlp/breeze_2.12/0.13.2/breeze_2.12-0.13.2.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/root/.cache/coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/spire-math/spire-macros_2.12/0.13.0/spire-macros_2.12-0.13.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/spire-math/spire_2.12/0.13.0/spire_2.12-0.13.0.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/tukaani/xz/1.5/xz-1.5.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/typelevel/machinist_2.12/0.6.1/machinist_2.12-0.6.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/typelevel/macro-compat_2.12/1.1.1/macro-compat_2.12-1.1.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/org/xerial/snappy/snappy-java/1.1.7.1/snappy-java-1.1.7.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/oro/oro/2.0.8/oro-2.0.8.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/root/.cache/coursier/v1/https/repo.huaweicloud.com/repository/maven/xmlenc/xmlenc/0.52/xmlenc-0.52.jar Main
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/09/17 01:29:37 INFO SparkContext: Running Spark version 2.4.0
23/09/17 01:29:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/09/17 01:29:37 INFO SparkContext: Submitted application: f20991d7-62b9-4796-a392-49f01fca56d4
23/09/17 01:29:38 INFO SecurityManager: Changing view acls to: root,hdfs
23/09/17 01:29:38 INFO SecurityManager: Changing modify acls to: root,hdfs
23/09/17 01:29:38 INFO SecurityManager: Changing view acls groups to:
23/09/17 01:29:38 INFO SecurityManager: Changing modify acls groups to:
23/09/17 01:29:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, hdfs); groups with view permissions: Set(); users with modify permissions: Set(root, hdfs); groups with modify permissions: Set()
23/09/17 01:29:39 INFO Utils: Successfully started service 'sparkDriver' on port 36921.
23/09/17 01:29:39 INFO SparkEnv: Registering MapOutputTracker
23/09/17 01:29:39 INFO SparkEnv: Registering BlockManagerMaster
23/09/17 01:29:39 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/09/17 01:29:39 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/09/17 01:29:39 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a888a9d8-fece-4f64-80f0-4def2abaeaad
23/09/17 01:29:39 INFO MemoryStore: MemoryStore started with capacity 230.7 MB
23/09/17 01:29:39 INFO SparkEnv: Registering OutputCommitCoordinator
23/09/17 01:29:40 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/09/17 01:29:40 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://master:4040
23/09/17 01:29:40 INFO Executor: Starting executor ID driver on host localhost
23/09/17 01:29:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38783.
23/09/17 01:29:41 INFO NettyBlockTransferService: Server created on master:38783
23/09/17 01:29:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/09/17 01:29:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 38783, None)
23/09/17 01:29:41 INFO BlockManagerMasterEndpoint: Registering block manager master:38783 with 230.7 MB RAM, BlockManagerId(driver, master, 38783, None)
23/09/17 01:29:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 38783, None)
23/09/17 01:29:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 38783, None)
Test Begin
筛选污染物SO2浓度排名前20的时段:
+---------------+------------------+
| 监测时间|SO2监测浓度(μg/m³)|
+---------------+------------------+
|2019-11-1 10:00| 32.0|
|2019-11-27 8:00| 37.0|
|2019-11-27 9:00| 29.0|
| 2019-6-26 4:00| 32.0|
|2019-8-12 16:00| 35.0|
|2019-8-15 21:00| 29.0|
| 2019-9-29 7:00| 30.0|
| 2019-9-29 8:00| 40.0|
| 2019-9-29 9:00| 35.0|
|2019-9-30 10:00| 32.0|
| 2019-9-30 2:00| 30.0|
| 2019-9-30 3:00| 40.0|
| 2019-9-30 4:00| 38.0|
| 2019-9-30 5:00| 32.0|
| 2019-9-30 6:00| 30.0|
| 2019-9-30 7:00| 31.0|
| 2019-9-30 8:00| 30.0|
| 2019-9-30 9:00| 34.0|
|2020-4-14 11:00| 47.0|
|2020-4-14 12:00| 33.0|
+---------------+------------------+
筛选污染物NO2浓度排名前20的时段:
+----------------+------------------+
| 监测时间|NO2监测浓度(μg/m³)|
+----------------+------------------+
|2019-12-11 22:00| 177.0|
|2019-12-11 23:00| 177.0|
|2019-12-30 19:00| 176.0|
|2020-11-25 19:00| 180.0|
| 2021-1-14 11:00| 180.0|
| 2021-1-14 12:00| 177.0|
| 2021-1-14 18:00| 183.0|
| 2021-1-14 19:00| 178.0|
| 2021-1-14 20:00| 192.0|
| 2021-1-14 21:00| 168.0|
| 2021-1-14 22:00| 170.0|
| 2021-1-14 23:00| 171.0|
| 2021-1-16 10:00| 211.0|
| 2021-1-16 11:00| 181.0|
| 2021-1-20 19:00| 197.0|
| 2021-1-20 20:00| 202.0|
| 2021-1-20 21:00| 192.0|
| 2021-1-20 22:00| 184.0|
| 2021-1-20 23:00| 187.0|
| 2021-1-21 0:00| 185.0|
+----------------+------------------+
筛选污染物PM10浓度排名前20的时段:
+----------------+-------------------+
| 监测时间|PM10监测浓度(μg/m³)|
+----------------+-------------------+
| 2019-11-2 22:00| 184.0|
| 2019-11-25 8:00| 189.0|
| 2019-12-12 0:00| 188.0|
|2019-12-13 23:00| 192.0|
| 2019-12-14 0:00| 184.0|
|2019-12-29 20:00| 197.0|
|2019-12-29 21:00| 187.0|
| 2019-12-29 9:00| 193.0|
|2019-12-30 18:00| 185.0|
|2019-12-30 19:00| 200.0|
| 2019-12-30 1:00| 184.0|
|2020-12-13 21:00| 184.0|
| 2020-4-12 4:00| 208.0|
| 2021-1-14 10:00| 191.0|
| 2021-1-14 11:00| 187.0|
| 2021-1-16 10:00| 217.0|
| 2021-1-20 20:00| 212.0|
| 2021-1-20 22:00| 197.0|
| 2021-1-21 0:00| 195.0|
| 2021-1-21 1:00| 211.0|
+----------------+-------------------+
筛选污染物PM2.5浓度排名前20的时段:
+----------------+--------------------+
| 监测时间|PM2.5监测浓度(μg/m³)|
+----------------+--------------------+
| 2019-11-25 8:00| 130.0|
| 2019-11-25 9:00| 126.0|
|2019-12-13 20:00| 124.0|
|2019-12-29 21:00| 127.0|
|2019-12-30 19:00| 129.0|
|2019-12-30 20:00| 145.0|
| 2021-1-14 19:00| 124.0|
| 2021-1-14 21:00| 128.0|
| 2021-1-14 22:00| 129.0|
| 2021-1-14 23:00| 124.0|
| 2021-1-15 0:00| 140.0|
| 2021-1-15 2:00| 125.0|
| 2021-1-16 10:00| 163.0|
| 2021-1-16 11:00| 156.0|
| 2021-1-16 12:00| 130.0|
| 2021-1-16 13:00| 128.0|
| 2021-1-20 21:00| 126.0|
| 2021-1-20 23:00| 124.0|
| 2021-1-21 1:00| 129.0|
| 2021-1-21 2:00| 125.0|
+----------------+--------------------+
筛选六大污染物浓度排名前20的时段:
+---------------+-----------------+
| 监测时间|O3监测浓度(μg/m³)|
+---------------+-----------------+
|2019-11-2 14:00| 294.0|
|2019-11-2 15:00| 296.0|
|2019-11-2 17:00| 287.0|
|2019-8-21 16:00| 304.0|
|2019-8-21 17:00| 300.0|
| 2019-8-5 15:00| 286.0|
|2019-9-26 15:00| 298.0|
|2019-9-26 16:00| 304.0|
| 2019-9-7 15:00| 295.0|
|2020-4-15 16:00| 324.0|
|2020-4-28 17:00| 293.0|
|2021-4-30 13:00| 305.0|
|2021-4-30 14:00| 368.0|
|2021-4-30 15:00| 405.0|
|2021-4-30 16:00| 387.0|
|2021-4-30 17:00| 291.0|
| 2021-4-6 15:00| 290.0|
| 2021-4-6 16:00| 306.0|
| 2021-4-6 17:00| 302.0|
| 2021-4-6 18:00| 302.0|
+---------------+-----------------+
Process finished with exit code 0
版权归原作者 无骨鱼虎' 所有, 如有侵权,请联系我们删除。