一、from+size 浅分页
- 浅分页的原理很简单,就是查询前20条数据,然后截断前10条,只返回10-20的数据。这样其实白白浪费了前10条的查询
- es默认采用的是from+size形式,在深度分页的情况下,这种效率是非常低的,但是可以随机跳转页面
- es为了性能,会限制我们分页的深度,es目前支持最大的max_result_window = 10000,也就是from+size的大小不能超过10000
DSL 查询方式
GET demo_index/_search
{
"query":{
"match_all": {}
},
"from": 0,
"size": 10,
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
注意:es是基于分片的,假设有3个分片,from=100,size=10。则会根据排序规则从3个分片中各取回100条数据数据,然后汇总成300条数据后选择最前边的10条数据
RestHighLevelClient 查询方式
/**
* @Description from+size浅分页查询
* @create by meng
*/
private List<SearchHit> docSearch(Date time, String title) {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder builder = QueryBuilders.boolQuery();
builder.must(QueryBuilders.matchAllQuery())
.filter(QueryBuilders.rangeQuery("publish_time").gt(time.getTime()));
try {
searchSourceBuilder.query(builder)
.sort("id", SortOrder.ASC)
.sort("publish_time", SortOrder.ASC)
.from(0)
.size(10);
SearchRequest searchRequest = new SearchRequest("demo_index").source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
if (hits.length > 0) {
return Arrays.asList(hits);
} else {
return null;
}
} catch (IOException e) {
log.error("doc分页查询异常:{} ", e);
}
return null;
}
二、scroll 深分页
- from+size查询在10000-50000条数据(1000到5000页)以内的时候还是可以的,但是如果数据过多的话,就会出现深分页问题。为了这个问题,es提出了scroll滚动查询方式
- scroll滚动搜索,会在第一次搜索的时候,保存一个当下的快照。之后只会基于该快照提供数据搜索。在这个期间数据如果发生变动,是不会让用户看到的。推荐非实时处理大量数据的情况可以使用
- 不适用于有跳页的情景
DSL 查询方式
GET demo_index/_search?scroll=3m
{
"query":{
"match_all": {}
},
"from": 0,
"size": 10,
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
- scroll=3m表示设置scroll_id保留3分钟可用
- 使用scroll必须要将from设置为0
- size决定后面每次调用_search搜索返回的数量
通过数据返回的_scroll_id读取下一页内容,每次请求将会读取下10条数据,直到数据读取完毕或者scroll_id保留时间截止:
GET _search/scroll
{
"scroll_id":"mengliulUaGVuRmV0Y2g7NTsxMDk5NDpkUmpiR2FjOFNhNnlCM1ZDMWpWYnRRO==",
"scroll": "3m"
}
注意:我们需要再次设置游标查询过期时间为3分钟,GET和POST请求均可,scroll是非常消耗资源的,所以当不需要scroll数据的时候,尽可能快的把scroll_id显式删除掉
清除指定的scroll_id:
DELETE _search/scroll/mengliulUaGVuRmV0Y2g7NTsxMDk5NDpkUmpiR2FjOFNhNnlCM1ZDMWpWYnRRO==
清除所有的scroll:
DELETE _search/scroll/_all
RestHighLevelClient 查询方式
/**
* @Description scroll 深分页
* @create by meng
*/
private void docSearch(Date time, String title) {
List<SearchHit> searchHits = new ArrayList<>();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder builder = QueryBuilders.boolQuery();
builder.must(QueryBuilders.matchAllQuery())
.filter(QueryBuilders.rangeQuery("publish_time").gt(time.getTime()));
try {
searchSourceBuilder.query(builder)
.sort("id", SortOrder.ASC)
.sort("publish_time", SortOrder.ASC)
.from(0)
.size(10);
SearchRequest searchRequest = new SearchRequest("demo_index").source(searchSourceBuilder);
//失效时间为3min
Scroll scroll = new Scroll(TimeValue.timeValueMinutes(3));
//封存快照
searchRequest.scroll(scroll);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
//计算总页数
long totalCount = searchResponse.getHits().getTotalHits().value;
int pageSize = (int) Math.ceil((float) totalCount / 2);
//多次遍历分页,获取结果
String scrollId = searchResponse.getScrollId();
for (int i = 1; i <= pageSize; i++) {
//获取scrollId
SearchScrollRequest searchScrollRequest = new SearchScrollRequest(scrollId);
searchScrollRequest.scroll(scroll);
SearchResponse response = restHighLevelClient.scroll(searchScrollRequest, RequestOptions.DEFAULT);
SearchHits hits = response.getHits();
scrollId = response.getScrollId();
Iterator<SearchHit> iterator = hits.iterator();
while (iterator.hasNext()) {
SearchHit next = iterator.next();
searchHits.add(next);
}
}
} catch (IOException e) {
log.error("doc分页查询异常:{} ", e);
}
}
三、search_after 深分页
- 可以在实时数据的情况下深度分页
- 为了找每一页最后一条数据,每个文档必须有一个全局唯一值
- 不适用于有跳页的情景
DSL 查询方式
GET demo_index/_search?scroll=3m
{
"query":{
"match_all": {}
},
"from": 0,
"size": 10,
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
- 使用search_after必须要将from设置为0
- 上边的DSL查询中id是唯一不重复字段,publish_time可能会重复
注意:查询结果会返回sort字段,我们在返回的结果集中,获取最后一条数据的sort属性值,提供给下次查询中search_after
GET demo_index/_search?scroll=3m
{
"query":{
"match_all": {}
},
"size": 10,
"search_after": [
1638374400000,
"mengliu20211202"
],
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
RestHighLevelClient 查询方式
/**
* @Description search_after 深分页
* @create by meng
*/
private void docSearch(Date time, String title) {
List<SearchHit> searchHits = new ArrayList<>();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder builder = QueryBuilders.boolQuery();
builder.must(QueryBuilders.matchAllQuery())
.filter(QueryBuilders.rangeQuery("publish_time").gt(time.getTime()));
try {
searchSourceBuilder.query(builder)
.sort("id", SortOrder.ASC)
.sort("publish_time", SortOrder.ASC)
.from(0)
.size(10);
SearchRequest searchRequest = new SearchRequest("demo_index").source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
//查询最后一个数据
SearchHit result = hits[hits.length - 1];
//分页查询下一页数据
SearchSourceBuilder searchSourceBuilder2 = new SearchSourceBuilder();
searchSourceBuilder2.query(builder)
.sort("id", SortOrder.ASC)
.sort("publish_time", SortOrder.ASC)
.size(10);
//存储上一次分页的sort信息
searchSourceBuilder2.searchAfter(result.getSortValues());
SearchRequest searchRequest2 = new SearchRequest("demo_index").source(searchSourceBuilder2);
SearchResponse searchResponse2 = restHighLevelClient.search(searchRequest2, RequestOptions.DEFAULT);
SearchHit[] hits2 = searchResponse2.getHits().getHits();
} catch (IOException e) {
log.error("doc分页查询异常:{} ", e);
}
}
版权归原作者 W_Meng_H 所有, 如有侵权,请联系我们删除。