SpringData、SparkStreaming和Flink集成Elasticsearch

本文代码链接：https://download.csdn.net/download/shangjg03/88522188

1Spring Data框架集成

1.1 Spring Data框架介绍

Spring Data是一个用于简化数据库、非关系型数据库、索引库访问，并支持云服务的开源框架。其主要目标是使得对数据的访问变得方便快捷，并支持map-reduce框架和云计算数据服务。 Spring Data可以极大的简化JPA（Elasticsearch…）的写法，可以在几乎不用写实现的情况下，实现对数据的访问和操作。除了CRUD外，还包括如分页、排序等一些常用的功能。

Spring Data的官网：Spring Data

Spring Data常用的功能模块如下：

1.2Spring Data Elasticsearch介绍

Spring Data Elasticsearch 基于 spring data API 简化 Elasticsearch操作，将原始操作Elasticsearch的客户端API 进行封装。Spring Data为Elasticsearch项目提供集成搜索引擎。Spring Data Elasticsearch POJO的关键功能区域为中心的模型与Elastichsearch交互文档和轻松地编写一个存储索引库数据访问层。

官方网站: https://spring.io/projects/spring-data-elasticsearch

1.3Spring Data Elasticsearch版本对比

目前最新springboot对应Elasticsearch7.6.2，Spring boot2.3.x一般可以兼容Elasticsearch7.x

1.4框架集成

创建Maven项目

修改pom文件，增加依赖关系

<?xmlversion="1.0"encoding="UTF-8"?><projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.3.6.RELEASE</version><relativePath/></parent><groupId>com.shangjack.es</groupId><artifactId>springdata-elasticsearch</artifactId><version>1.0</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target></properties><dependencies><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-devtools</artifactId><scope>runtime</scope><optional>true</optional></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-test</artifactId></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId></dependency><dependency><groupId>org.springframework</groupId><artifactId>spring-test</artifactId></dependency></dependencies></project>

增加配置文件

在resources目录中增加application.properties文件

# es服务地址elasticsearch.host=127.0.0.1# es服务端口elasticsearch.port=9200# 配置日志级别,开启debug日志logging.level.com.shangjack.es=debug

SpringBoot主程序

package com.shangjack.es;

importorg.springframework.boot.SpringApplication;importorg.springframework.boot.autoconfigure.SpringBootApplication;@SpringBootApplicationpublicclassSpringDataElasticSearchMainApplication{publicstaticvoidmain(String[]args){SpringApplication.run(SpringDataElasticSearchMainApplication.class,args);}}数据实体类packagecom.shangjack.es;importlombok.AllArgsConstructor;importlombok.Data;importlombok.NoArgsConstructor;importlombok.ToString;@Data@NoArgsConstructor@AllArgsConstructor@ToStringpublicclassProduct{privateLongid;//商品唯一标识privateStringtitle;//商品名称privateStringcategory;//分类名称privateDoubleprice;//商品价格privateStringimages;//图片地址}

配置类

ElasticsearchRestTemplate是spring-data-elasticsearch项目中的一个类，和其他spring项目中的template类似。
在新版的spring-data-elasticsearch中，ElasticsearchRestTemplate代替了原来的ElasticsearchTemplate。
原因是ElasticsearchTemplate基于TransportClient，TransportClient即将在8.x以后的版本中移除。所以，我们推荐使用ElasticsearchRestTemplate。
ElasticsearchRestTemplate基于RestHighLevelClient客户端的。需要自定义配置类，继承AbstractElasticsearchConfiguration，并实现elasticsearchClient()抽象方法，创建RestHighLevelClient对象。

packagecom.shangjack.es;importlombok.Data;importorg.apache.http.HttpHost;importorg.elasticsearch.client.RestClient;importorg.elasticsearch.client.RestClientBuilder;importorg.elasticsearch.client.RestHighLevelClient;importorg.springframework.boot.context.properties.ConfigurationProperties;importorg.springframework.context.annotation.Configuration;importorg.springframework.data.elasticsearch.config.AbstractElasticsearchConfiguration;@ConfigurationProperties(prefix="elasticsearch")@Configuration@DatapublicclassElasticsearchConfigextendsAbstractElasticsearchConfiguration{privateStringhost;privateIntegerport;//重写父类方法    @OverridepublicRestHighLevelClientelasticsearchClient(){RestClientBuilderbuilder=RestClient.builder(newHttpHost(host,port));RestHighLevelClientrestHighLevelClient=newRestHighLevelClient(builder);returnrestHighLevelClient;}}

DAO数据访问对象

packagecom.shangjack.es;importorg.springframework.data.elasticsearch.repository.ElasticsearchRepository;importorg.springframework.stereotype.Repository;@RepositorypublicinterfaceProductDaoextendsElasticsearchRepository<Product,Long>{}

实体类映射操作

packagecom.shangjack.es;importlombok.AllArgsConstructor;importlombok.Data;importlombok.NoArgsConstructor;importlombok.ToString;importorg.springframework.data.annotation.Id;importorg.springframework.data.elasticsearch.annotations.Document;importorg.springframework.data.elasticsearch.annotations.Field;importorg.springframework.data.elasticsearch.annotations.FieldType;@Data@NoArgsConstructor@AllArgsConstructor@ToString@Document(indexName="shopping",shards=3,replicas=1)publicclassProduct{//必须有id,这里的id是全局唯一的标识，等同于es中的"_id"    @IdprivateLongid;//商品唯一标识/***type: 字段数据类型*analyzer: 分词器类型*index: 是否索引(默认:true)*Keyword: 短语,不进行分词*/    @Field(type=FieldType.Text,analyzer="ik_max_word")privateStringtitle;//商品名称    @Field(type=FieldType.Keyword)privateStringcategory;//分类名称    @Field(type=FieldType.Double)privateDoubleprice;//商品价格    @Field(type=FieldType.Keyword,index=false)privateStringimages;//图片地址}

索引操作

packagecom.shangjack.es;importorg.junit.Test;importorg.junit.runner.RunWith;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.boot.test.context.SpringBootTest;importorg.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;importorg.springframework.test.context.junit4.SpringRunner;@RunWith(SpringRunner.class)@SpringBootTestpublicclassSpringDataESIndexTest{//注入ElasticsearchRestTemplate    @AutowiredprivateElasticsearchRestTemplateelasticsearchRestTemplate;//创建索引并增加映射配置    @TestpublicvoidcreateIndex(){//创建索引，系统初始化会自动创建索引System.out.println("创建索引");}    @TestpublicvoiddeleteIndex(){//创建索引，系统初始化会自动创建索引booleanflg=elasticsearchRestTemplate.deleteIndex(Product.class);System.out.println("删除索引 = "+flg);}}

文档操作

packagecom.shangjack.es;importorg.junit.Test;importorg.junit.runner.RunWith;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.boot.test.context.SpringBootTest;importorg.springframework.data.domain.Page;importorg.springframework.data.domain.PageRequest;importorg.springframework.data.domain.Sort;importorg.springframework.test.context.junit4.SpringRunner;importjava.util.ArrayList;importjava.util.List;@RunWith(SpringRunner.class)@SpringBootTestpublicclassSpringDataESProductDaoTest{    @AutowiredprivateProductDaoproductDao;/*** 新增*/    @Testpublicvoidsave(){Productproduct=newProduct();product.setId(2L);product.setTitle("华为手机");product.setCategory("手机");product.setPrice(2999.0);product.setImages("http://www.shangjack/hw.jpg");productDao.save(product);}//修改    @Testpublicvoidupdate(){Productproduct=newProduct();product.setId(1L);product.setTitle("小米2手机");product.setCategory("手机");product.setPrice(9999.0);product.setImages("http://www.shangjack/xm.jpg");productDao.save(product);}//根据id查询    @TestpublicvoidfindById(){Productproduct=productDao.findById(1L).get();System.out.println(product);}//查询所有    @TestpublicvoidfindAll(){Iterable<Product>products=productDao.findAll();for(Productproduct:products){System.out.println(product);}}//删除    @Testpublicvoiddelete(){Productproduct=newProduct();product.setId(1L);productDao.delete(product);}//批量新增    @TestpublicvoidsaveAll(){List<Product>productList=newArrayList<>();for(inti=0;i<10;i++){Productproduct=newProduct();product.setId(Long.valueOf(i));product.setTitle("["+i+"]小米手机");product.setCategory("手机");product.setPrice(1999.0+i);product.setImages("http://www.shangjack/xm.jpg");productList.add(product);}productDao.saveAll(productList);}//分页查询    @TestpublicvoidfindByPageable(){//设置排序(排序方式，正序还是倒序，排序的id)Sortsort=Sort.by(Sort.Direction.DESC,"id");intcurrentPage=0;//当前页，第一页从0开始，1表示第二页intpageSize=5;//每页显示多少条//设置查询分页PageRequestpageRequest=PageRequest.of(currentPage,pageSize,sort);//分页查询Page<Product>productPage=productDao.findAll(pageRequest);for(ProductProduct:productPage.getContent()){System.out.println(Product);}}}

文档搜索

packagecom.shangjack.es;importorg.elasticsearch.index.query.QueryBuilders;importorg.elasticsearch.index.query.TermQueryBuilder;importorg.junit.Test;importorg.junit.runner.RunWith;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.boot.test.context.SpringBootTest;importorg.springframework.data.domain.PageRequest;importorg.springframework.test.context.junit4.SpringRunner;@RunWith(SpringRunner.class)@SpringBootTestpublicclassSpringDataESSearchTest{    @AutowiredprivateProductDaoproductDao;/***term查询*search(termQueryBuilder) 调用搜索方法，参数查询构建器对象*/    @TestpublicvoidtermQuery(){TermQueryBuildertermQueryBuilder=QueryBuilders.termQuery("title","小米");Iterable<Product>products=productDao.search(termQueryBuilder);for(Productproduct:products){System.out.println(product);}}/***term查询加分页*/    @TestpublicvoidtermQueryByPage(){intcurrentPage=0;intpageSize=5;//设置查询分页PageRequestpageRequest=PageRequest.of(currentPage,pageSize);TermQueryBuildertermQueryBuilder=QueryBuilders.termQuery("title","小米");Iterable<Product>products=productDao.search(termQueryBuilder,pageRequest);for(Productproduct:products){System.out.println(product);}}}

2Spark Streaming框架集成

2.1 Spark Streaming框架介绍

Spark Streaming是Spark core API的扩展，支持实时数据流的处理，并且具有可扩展，高吞吐量，容错的特点。数据可以从许多来源获取，如Kafka，Flume，Kinesis或TCP sockets，并且可以使用复杂的算法进行处理，这些算法使用诸如map，reduce，join和window等高级函数表示。最后，处理后的数据可以推送到文件系统，数据库等。实际上，您可以将Spark的机器学习和图形处理算法应用于数据流。

2.2框架集成

创建Maven项目

修改pom文件，增加依赖关系

<?xmlversion="1.0"encoding="UTF-8"?><projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.shangjack.es</groupId><artifactId>sparkstreaming-elasticsearch</artifactId><version>1.0</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target></properties><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>7.8.0</version></dependency><!--elasticsearch的客户端 --><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.8.0</version></dependency><!--elasticsearch依赖2.x的log4j--><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-api</artifactId><version>2.8.2</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-core</artifactId><version>2.8.2</version></dependency><!--<dependency>--><!--<groupId>com.fasterxml.jackson.core</groupId>--><!--<artifactId>jackson-databind</artifactId>--><!--<version>2.11.1</version>--><!--</dependency>--><!--&lt;!&ndash;junit单元测试 &ndash;&gt;--><!--<dependency>--><!--<groupId>junit</groupId>--><!--<artifactId>junit</artifactId>--><!--<version>4.12</version>--><!--</dependency>--></dependencies></project>

功能实现

packagecom.shangjack.esimportorg.apache.http.HttpHostimportorg.apache.spark.SparkConfimportorg.apache.spark.streaming.dstream.ReceiverInputDStreamimportorg.apache.spark.streaming.{Seconds,StreamingContext}importorg.elasticsearch.action.index.IndexRequestimportorg.elasticsearch.client.indices.CreateIndexRequestimportorg.elasticsearch.client.{RequestOptions,RestClient,RestHighLevelClient}importorg.elasticsearch.common.xcontent.XContentTypeimportjava.util.DateobjectSparkStreamingESTest{defmain(args:Array[String]):Unit={valsparkConf=newSparkConf().setMaster("local[*]").setAppName("ESTest")valssc=newStreamingContext(sparkConf,Seconds(3))valds:ReceiverInputDStream[String]=ssc.socketTextStream("localhost",9999)ds.foreachRDD(rdd=>{println("*************** "+newDate())rdd.foreach(data=>{valclient=newRestHighLevelClient(RestClient.builder(newHttpHost("localhost",9200,"http")));// 新增文档 - 请求对象valrequest=newIndexRequest();// 设置索引及唯一性标识valss=data.split(" ")println("ss = "+ss.mkString(","))request.index("sparkstreaming").id(ss(0));valproductJson=s"""|{"data":"${ss(1)}"}|""".stripMargin;// 添加文档数据，数据格式为JSON格式request.source(productJson,XContentType.JSON);// 客户端发送请求，获取响应对象valresponse=client.index(request,RequestOptions.DEFAULT);System.out.println("_index:"+response.getIndex());System.out.println("_id:"+response.getId());System.out.println("_result:"+response.getResult());client.close()})})ssc.start()ssc.awaitTermination()}}

3Flink框架集成

3.1 Flink框架介绍

Apache Spark是一种基于内存的快速、通用、可扩展的大数据分析计算引擎。

Apache Spark掀开了内存计算的先河，以内存作为赌注，赢得了内存计算的飞速发展。但是在其火热的同时，开发人员发现，在Spark中，计算框架普遍存在的缺点和不足依然没有完全解决，而这些问题随着5G时代的来临以及决策者对实时数据分析结果的迫切需要而凸显的更加明显：

数据精准一次性处理（Exactly-Once）
乱序数据，迟到数据
低延迟，高吞吐，准确性
容错性

Apache Flink是一个框架和分布式处理引擎，用于对无界和有界数据流进行有状态计算。在Spark火热的同时，也默默地发展自己，并尝试着解决其他计算框架的问题。

慢慢地，随着这些问题的解决，Flink慢慢被绝大数程序员所熟知并进行大力推广，阿里公司在2015年改进Flink，并创建了内部分支Blink，目前服务于阿里集团内部搜索、推荐、广告和蚂蚁等大量核心实时业务。

3.2框架集成

创建Maven项目

修改pom文件，增加相关依赖类库

<?xmlversion="1.0"encoding="UTF-8"?><projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.shangjack.es</groupId><artifactId>flink-elasticsearch</artifactId><version>1.0</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target></properties><dependencies><dependency><groupId>org.apache.flink</groupId><artifactId>flink-scala_2.12</artifactId><version>1.12.0</version></dependency><dependency><groupId>org.apache.flink</groupId><artifactId>flink-streaming-scala_2.12</artifactId><version>1.12.0</version></dependency><dependency><groupId>org.apache.flink</groupId><artifactId>flink-clients_2.12</artifactId><version>1.12.0</version></dependency><dependency><groupId>org.apache.flink</groupId><artifactId>flink-connector-elasticsearch7_2.11</artifactId><version>1.12.0</version></dependency><!--jackson--><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-core</artifactId><version>2.11.1</version></dependency></dependencies></project>

功能实现

packagecom.shangjack.es;importorg.apache.flink.api.common.functions.RuntimeContext;importorg.apache.flink.streaming.api.datastream.DataStreamSource;importorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment;importorg.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;importorg.apache.flink.streaming.connectors.elasticsearch.RequestIndexer;importorg.apache.flink.streaming.connectors.elasticsearch7.ElasticsearchSink;importorg.apache.http.HttpHost;importorg.elasticsearch.action.index.IndexRequest;importorg.elasticsearch.client.Requests;importjava.util.ArrayList;importjava.util.HashMap;importjava.util.List;importjava.util.Map;publicclassFlinkElasticsearchSinkTest{publicstaticvoidmain(String[]args)throwsException{StreamExecutionEnvironmentenv=StreamExecutionEnvironment.getExecutionEnvironment();DataStreamSource<String>source=env.socketTextStream("localhost",9999);List<HttpHost>httpHosts=newArrayList<>();httpHosts.add(newHttpHost("127.0.0.1",9200,"http"));//httpHosts.add(newHttpHost("10.2.3.1",9200,"http"));//useaElasticsearchSink.BuildertocreateanElasticsearchSinkElasticsearchSink.Builder<String>esSinkBuilder=newElasticsearchSink.Builder<>(httpHosts,newElasticsearchSinkFunction<String>(){publicIndexRequestcreateIndexRequest(Stringelement){Map<String,String>json=newHashMap<>();json.put("data",element);returnRequests.indexRequest().index("my-index")//.type("my-type").source(json);}                    @Overridepublicvoidprocess(Stringelement,RuntimeContextctx,RequestIndexerindexer){indexer.add(createIndexRequest(element));}});//configurationforthebulkrequests;thisinstructsthesinktoemitaftereveryelement,otherwisetheywouldbebufferedesSinkBuilder.setBulkFlushMaxActions(1);//provideaRestClientFactoryforcustomconfigurationontheinternallycreatedRESTclient//esSinkBuilder.setRestClientFactory(//restClientBuilder->{//restClientBuilder.setDefaultHeaders(...)//restClientBuilder.setMaxRetryTimeoutMillis(...)//restClientBuilder.setPathPrefix(...)//restClientBuilder.setHttpClientConfigCallback(...)//}//);source.addSink(esSinkBuilder.build());env.execute("flink-es");}}

标签： flink elasticsearch spark

本文转载自: https://blog.csdn.net/shangjg03/article/details/134336259
版权归原作者 shangjg3 所有，如有侵权，请联系我们删除。