文章目录
1. 问题引入
我们项目中有一个需求:ElasticSearch存在很多历史数据,然后需求中索引新增了一个字段,我们需要根据条件查询出历史数据,但历史数据中这个新增的字段并不存在,如何查询到历史数据呢?
1. 索引2个文档
PUT/user/_doc/1{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}PUT/user/_doc/2{"first_name":"zhangsan","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
2. 给索引增加新的字段
PUT/user/_mapping
{"properties":{"height":{"type":"long"}}}
3. 再次索引1个文档
这个文档新增了height字段的值
PUT/user/_doc/3{"first_name":"lisi","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"],"height":175}
4. 查看索引中的文档
GET/user/_search
{"took":817,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"user","_type":"_doc","_id":"1","_score":1.0,"_source":{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}},{"_index":"user","_type":"_doc","_id":"2","_score":1.0,"_source":{"first_name":"zhangsan","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}},{"_index":"user","_type":"_doc","_id":"3","_score":1.0,"_source":{"first_name":"lisi","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"],"height":175}}]}}
从上面的结果可以看出,在ElasticSearch中为已有索引增加一个新字段以后,老的数据并不会自动就拥有了这个新字段,也就不可能给他一个默认值。因此前面2条数据都没有 height 这个字段。
在ElasticSearch中,如果一个字段不存在或者这个字段的值为null,在检索的时候该字段会被忽略,因此也就无法做空值搜索。
PUT my_index/my_type/1{"first_name":"zhangsan"}
PUT my_index/my_type/2{"first_name":"wangwu","height":null}
例如上面的2个文档,都无法根据 height 这个字段检索。那么我们如何查询到没增加字段之前的历史数据呢?
2. must_not & exist
POST /user/_search
{"query":{"bool":{"must_not":[{"exists":{"field":"height"}}]}}}
{"took":7,
"timed_out": false,
"_shards":{"total":1,
"successful":1,
"skipped":0,
"failed":0},
"hits":{"total":{"value":2,
"relation":"eq"},
"max_score":0.0,
"hits":[{"_index":"user",
"_type":"_doc",
"_id":"1",
"_score":0.0,
"_source":{"first_name":"John",
"last_name":"Smith",
"age":25,
"about":"I love to go rock climbing",
"interests":["sports",
"music"]}},
{"_index":"user",
"_type":"_doc",
"_id":"2",
"_score":0.0,
"_source":{"first_name":"zhangsan",
"last_name":"Smith",
"age":25,
"about":"I love to go rock climbing",
"interests":["sports",
"music"]}}]}}
exists 返回在原始字段中至少有一个非空值的文档:
GET /user/_search
{"query":{"exists":{"field":"height"}}}
{"took":1,
"timed_out": false,
"_shards":{"total":1,
"successful":1,
"skipped":0,
"failed":0},
"hits":{"total":{"value":1,
"relation":"eq"},
"max_score":1.0,
"hits":[{"_index":"user",
"_type":"_doc",
"_id":"3",
"_score":1.0,
"_source":{"first_name":"lisi",
"last_name":"Smith",
"age":25,
"about":"I love to go rock climbing",
"interests":["sports",
"music"],
"height":175}}]}}
3. 给历史数据赋初值
对现有索引新增字段时并不会影响历史数据,因此我们可以修改历史数据文档,对历史数据设置默认值,然后根据默认值检索。
使用脚本批量更新文档:_update_by_query,如果字段的值为null,则给该字段赋初值为0
POST/user/_update_by_query
{"script":{"lang":"painless","inline":"if (ctx._source.height== null) {ctx._source.height=0}"}}
再次查看索引的文档:
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"user","_type":"_doc","_id":"1","_score":1.0,"_source":{"about":"I love to go rock climbing","last_name":"Smith","interests":["sports","music"],"first_name":"John","age":25,"height":0}},{"_index":"user","_type":"_doc","_id":"2","_score":1.0,"_source":{"about":"I love to go rock climbing","last_name":"Smith","interests":["sports","music"],"first_name":"zhangsan","age":25,"height":0}},{"_index":"user","_type":"_doc","_id":"3","_score":1.0,"_source":{"about":"I love to go rock climbing","last_name":"Smith","interests":["sports","music"],"first_name":"lisi","age":25,"height":175}}]}}
历史数据中 height 字段都有了默认值 0
版权归原作者 我一直在流浪 所有, 如有侵权,请联系我们删除。