ES elasticsearch 各种查询,es 查询index

摘要：

GEtwitter/_搜索｛“_source”：“_type”：

source filtering
我们可以通过 _source 来定义返回想要的字段：

GET twitter/_search
{
"_source": ["user", "city"],
"query": {
"match_all": {
}
}
}
返回的结果:

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"city" : "北京",
"user" : "张三"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"city" : "北京",
"user" : "老刘"
}
},
...
]
我们也可以使用如下的方法：

GET twitter/_search
{
"_source": {
"includes": ["user", "city"]
},
"query": {
"match_all": {
}
}
}
上面返回的结果和之前的返回的是一样的结果。

我们可以看到只有 user 及 city 两个字段在 _source 里返回。我们可以可以通过设置 _source 为 false，这样不返回任何的 _source信息：

GET twitter/_search
{
"_source": false,
"query": {
"match": {
"user": "张三"
}
}
}
返回的信息：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0808902
}
]
我们可以看到只有 _id 及 _score 等信息返回。其它任何的 _source 字段都没有被返回。它也可以接收通配符形式的控制，比如：

GET twitter/_search
{
"_source": {
"includes": [
"user*",
"location*"
],
"excludes": [
"*.lat"
]
},
"query": {
"match_all": {}
}
}
返回的结果是：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"location" : {
"lon" : "116.325747"
},
"user" : "张三"
}
},
...
]
如果我们把 _source 设置为[]，那么就是显示所有的字段，而不是不显示任何字段的功能。

GET twitter/_search
{
"_source": [],
"query": {
"match_all": {
}
}
}
上面的命令将显示 source 的所有字段。

Script fields

有些时候，我们想要的 field 可能在 _source 里根本没有，那么我们可以使用 script field 来生成这些 field。允许为每个匹配返回script evaluation（基于不同的字段），例如：

GET twitter/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"years_to_100": {
"script": {
"lang": "painless",
"source": "100-doc['age'].value"
}
},
"year_of_birth":{
"script": "2019 - doc['age'].value"
}
}
}
返回的结果是：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"years_to_100" : [
80
],
"year_of_birth" : [
1999
]
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"years_to_100" : [
70
],
"year_of_birth" : [
1989
]
}
},
...
]
必须注意的是这种使用 script 的方法来生成查询的结果对于大量的文档来说，可能会占用大量资源。在这里大家一定要注意的是：doc 在这里指的是 doc value。否则的话，我们需要使用 ctx._source 来做一些搜索的动作。参照链接，我们可以把上面的命令修改为：

GET twitter/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"years_to_100": {
"script": {
"lang": "painless",
"source": "100-params._source['age']"
}
},
"year_of_birth":{
"script": "2019 - params._source['age']"
}
}
}
因为 age 是 long 数据类型。它是有 doc value 的，所以，我们可以通过 doc['age'] 来访问，而且这些访问是比较快的。

Count API
我们经常会查询我们的索引里到底有多少文档，那么我们可以使用_count重点来查询：

GET twitter/_count
如果我们想知道满足条件的文档的数量，我们可以采用如下的格式：

GET twitter/_count
{
"query": {
"match": {
"city": "北京"
}
}
}
在这里，我们可以得到 city 为“北京”的所有文档的数量：

{
"count" : 5,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
修改 settings
我们可以通过如下的接口来获得一个 index 的 settings

GET twitter/_settings

从这里我们可以看到我们的 twitter 索引有多少个 shards 及多少个 replicas。我们也可以通过如下的接口来设置：

PUT twitter
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
一旦我们把 number_of_shards 定下来了，我们就不可以修改了，除非把 index 删除，并重新 index 它。这是因为每个文档存储到哪一个 shard 是和 number_of_shards这个数值有关的。一旦这个数值发生改变，那么之后寻找那个文档所在的 shard 就会不准确。

修改索引的 mapping
Elasticsearch 号称是 schemaless，在实际所得应用中，每一个 index 都有一个相应的 mapping。这个 mapping 在我们生产第一个文档时已经生产。它是对每个输入的字段进行自动的识别从而判断它们的数据类型。我们可以这么理解 schemaless：

不需要事先定义一个相应的 mapping 才可以生产文档。字段类型是动态进行识别的。这和传统的数据库是不一样的
如果有动态加入新的字段，mapping 也可以自动进行调整并识别新加入的字段
自动识别字段有一个问题，那就是有的字段可能识别并不精确，比如对于我们例子中的位置信息。那么我们需要对这个字段进行修改。

我们可以通过如下的命令来查询目前的 index 的 mapping:

GET twitter/_mapping
它显示的数据如下：

{
"twitter" : {
"mappings" : {
"properties" : {
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"location" : {
"properties" : {
"lat" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lon" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"province" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"uid" : {
"type" : "long"
},
"user" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
从上面的显示中可以看出来 location 里的经纬度是一个 multi-field 的类型。

"location" : {
"properties" : {
"lat" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lon" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
这个显然不是我们所需的。正确的类型应该是：geo_point。我们重新修正我们的 mapping。

注意：我们不能为已经建立好的 index 动态修改 mapping。这是因为一旦修改，那么之前建立的索引就变成不能搜索的了。一种办法是 reindex 从而重新建立我们的索引。如果在之前的 mapping 加入新的字段，那么我们可以不用重新建立索引。

为了能够正确地创建我们的 mapping，我们必须先把之前的 twitter 索引删除掉，并同时使用 settings 来创建这个 index。具体的步骤如下：

DELETE twitter
PUT twitter
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}

PUT twitter/_mapping
{
"properties": {
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"location": {
"type": "geo_point"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"province": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"uid": {
"type": "long"
},
"user": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
重新查看我们的 mapping:

GET twitter/_mapping
我们可以看到我们已经创建好了新的 mapping。我们再次运行之前我们的 bulk 接口，并把我们所需要的数据导入到 twitter 索引中。

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}
至此，我们已经完整地建立了我们所需要的索引。在下面，我们开始使用 DSL（Domain Specifc Lanaguage）来帮我们进行查询。

查询数据
在这个章节里，我们来展示一下从我们的 ES 索引中查询我们所想要的数据。

match query
GET twitter/_search
{
"query": {
"match": {
"city": "北京"
}
}
}

从我们查询的结果来看，我们可以看到有5个用户是来自北京的，而且查询出来的结果是按照关联（relavance）来进行排序的。

在很多的情况下，我们也可以使用 script query 来完成：

GET twitter/_search
{
"query": {
"script": {
"script": {
"source": "doc['city.keyword'].contains(params.name)",
"lang": "painless",
"params": {
"name": "北京"
}
}
}
}
}
上面的 script query 和上面的查询是一样的结果，但是我们不建议大家使用这种方法。相比较而言，script query 的方法比较低效。另外，假如我们的文档是几百万或者 PB 级的数据量，那么上面的运算可能被执行无数次，那么可能需要巨大的计算量。在这种情况下，我们需要考虑在 ingest 的时候做计算。请阅读我的另外一篇文章 “避免不必要的脚本 - scripting”。

上面的搜索也可以这么实现：

GET twitter/_search?q=city:"北京"
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.48232412,
"_source" : {
"user" : "双榆树-张三",
"message" : "今儿天气不错啊，出去转转去",
"uid" : 2,
"age" : 20,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
}
}
}
...
]
如果你想了解更多，你可以更进一步阅读 “Elasticsearch: 使用URI Search”。

如果我们不需要这个 score，我们可以选择 filter 来完成。

GET twitter/_search
{
"query": {
"bool": {
"filter": {
"term": {
"city.keyword": "北京"
}
}
}
}
}
这里我们使用了 filter 来过滤我们的搜索，显示的结果如下：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"user" : "双榆树-张三",
"message" : "今儿天气不错啊，出去转转去",
"uid" : 2,
"age" : 20,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
}
}
},

...
}
从返回的结果来看，_score 项为0。对于这种搜索，只要 yes 或 no。我们并不关心它们是的相关性。在这里我们使用了city.keyword。对于一些刚接触 Elasticsearch的人来说，这个可能比较陌生。正确的理解是 city 在我们的 mapping 中是一个 multi-field 项。它既是 text 也是 keyword 类型。对于一个 keyword 类型的项来说，这个项里面的所有字符都被当做一个字符串。它们在建立文档时，不需要进行 index。keyword 字段用于精确搜索，aggregation 和排序（sorting）。

所以在我们的 filter 中，我们是使用了 term 来完成这个查询。

我们也可以使用如下的办法达到同样的效果：

GET twitter/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"city.keyword": {
"value": "北京"
}
}
}
}
}
}
在我们使用 match query 时，默认的操作是 OR，我们可以做如下的查询：

GET twitter/_search
{
"query": {
"match": {
"user": {
"query": "朝阳区-老贾",
"operator": "or"
}
}
}
}
上面的查询也和如下的查询是一样的：

GET twitter/_search
{
"query": {
"match": {
"user": "朝阳区-老贾"
}
}
}
这是因为默认的操作是 or 操作。上面查询的结果是任何文档匹配：“朝”，“阳”，“区”，“老”及“贾”这5个字中的任何一个将被显示：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 4.4209847,
"_source" : {
"user" : "朝阳区-老贾",
"message" : "123,gogogo",
"uid" : 5,
"age" : 35,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 2.9019678,
"_source" : {
"user" : "朝阳区-老王",
"message" : "Happy BirthDay My Friend!",
"uid" : 6,
"age" : 50,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.8713734,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.4753614,
"_source" : {
"user" : "虹桥-老吴",
"message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
"uid" : 7,
"age" : 90,
"city" : "上海",
"province" : "上海",
"country" : "中国",
"address" : "中国上海市闵行区",
"location" : {
"lat" : "31.175927",
"lon" : "121.383328"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.4356867,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
}
}
]
我们也可以设置参数 minimum_should_match 来设置至少匹配的 term。比如：

GET twitter/_search
{
"query": {
"match": {
"user": {
"query": "朝阳区-老贾",
"operator": "or",
"minimum_should_match": 3
}
}
}
}
上面显示我们至少要匹配“朝”，“阳”，“区”，“老” 及 “贾” 这5个中的3个字才可以。显示结果：

GET twitter/_search
{
"query": {
"match": {
"user": {
"query": "朝阳区-老贾",
"operator": "and"
}
}
}
}
显示的结果是：

Ids query
我们可以通过 id 来进行查询，比如：

GET twitter/_search
{
"query": {
"ids": {
"values": ["1", "2"]
}
}
}
上面的查询将返回 id 为 “1” 和 “2” 的文档。

multi_match
在上面的搜索之中，我们特别指明一个专有的 field 来进行搜索，但是在很多的情况下，我们并胡知道是哪一个 field 含有这个关键词，那么在这种情况下，我们可以使用 multi_match 来进行搜索：

GET twitter/_search
{
"query": {
"multi_match": {
"query": "朝阳",
"fields": [
"user",
"address^3",
"message"
],
"type": "best_fields"
}
}
}
在上面，我们可以看到这个 multi_search 的 type 为 best_fields，也就是说它搜索了3个字段。最终的分数 _score 是按照得分最高的那个字段的分数为准。更多类型的定义，请在链接查看。在上面，我们可以同时对三个 fields: user，adress 及 message进行搜索，但是我们对 address 含有 “朝阳” 的文档的分数进行3倍的加权。返回的结果：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 6.1777167,
"_source" : {
"user" : "朝阳区-老王",
"message" : "Happy good BirthDay My Friend!",
"uid" : 6,
"age" : 50,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 5.9349246,
"_source" : {
"user" : "朝阳区-老贾",
"message" : "123,gogogo",
"uid" : 5,
"age" : 35,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
}
}
}
]
Prefix query
返回在提供的字段中包含特定前缀的文档。

GET twitter/_search
{
"query": {
"prefix": {
"user": {
"value": "朝"
}
}
}
}
查询 user 字段里以“朝”为开头的所有文档：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"user" : "朝阳区-老贾",
"message" : "123,gogogo",
"uid" : 5,
"age" : 35,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"user" : "朝阳区-老王",
"message" : "Happy BirthDay My Friend!",
"uid" : 6,
"age" : 50,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
}
}
]

Term query
Term query 会在给定字段中进行精确的字词匹配。因此，您需要提供准确的术语以获取正确的结果。

GET twitter/_search
{
"query": {
"term": {
"user.keyword": {
"value": "朝阳区-老贾"
}
}
}
}
在这里，我们使用 user.keyword 来对“朝阳区-老贾”进行精确匹配查询相应的文档：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.5404451,
"_source" : {
"user" : "朝阳区-老贾",
"message" : "123,gogogo",
"uid" : 5,
"age" : 35,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
}
}
}
]
Terms query
如果我们想对多个 terms 进行查询，我们可以使用如下的方式来进行查询：

GET twitter/_search
{
"query": {
"terms": {
"user.keyword": [
"双榆树-张三",
"东城区-老刘"
]
}
}
}
上面查询 user.keyword 里含有“双榆树-张三”或“东城区-老刘”的所有文档。

Terms_set query
查询在提供的字段中包含最少数目的精确术语的文档。除你可以定义返回文档所需的匹配术语数之外，terms_set 查询与术语查询相同。例如：

PUT /job-candidates
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"programming_languages": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}

PUT /job-candidates/_doc/1?refresh
{
"name": "Jane Smith",
"programming_languages": [ "c++", "java" ],
"required_matches": 2
}

PUT /job-candidates/_doc/2?refresh
{
"name": "Jason Response",
"programming_languages": [ "java", "php" ],
"required_matches": 2
}

GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "c++", "java", "php" ],
"minimum_should_match_field": "required_matches"
}
}
}
}
在上面，我们为 job-candiates 索引创建了两个文档。我们需要找出在 programming_languages 中同时含有 c++, java 以及 php 中至少有两个 term 的文档。在这里，我们使用了一个在文档中定义的字段 required_matches 来定义最少满足要求的 term 个数。另外一种方式是使用 minimum_should_match_script 来定义，如果没有一个专有的字段来定义这个的话：

GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "c++", "java", "php" ],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
}
上面标示需要至少同时满足有 2 个及以上的 term。上面搜索的结果为：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.1005894,
"hits" : [
{
"_index" : "job-candidates",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1005894,
"_source" : {
"name" : "Jane Smith",
"programming_languages" : [
"c++",
"java"
],
"required_matches" : 2
}
},
{
"_index" : "job-candidates",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1005894,
"_source" : {
"name" : "Jason Response",
"programming_languages" : [
"java",
"php"
],
"required_matches" : 2
}
}
]
}
}
也就是说之前的两个文档都同时满足条件。当然如果我们使用如下的方式来进行搜索：

GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "c++", "java", "nodejs" ],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
}
我们将看到只有一个文档是满足条件的。

复合查询（compound query）
在上面，我们用到了许多的 leaf 查询，比如：

"query": {
"match": {
"city": "北京"
}
}
什么是复合查询呢？如果说上面的查询是 leaf 查询的话，那么复合查询可以把很多个 leaf 查询组合起来从而形成更为复杂的查询。它一般的格式是：

POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tag" : "wow" } },
{ "term" : { "tag" : "elasticsearch" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
从上面我们可以看出，它是由 bool 下面的 must, must_not, should 及 filter 共同来组成的。你可以使用 minimum_should_match 参数指定返回的文档必须匹配的应当子句的数量或百分比。如果布尔查询包含至少一个 should 子句，并且没有 must 或 filter 子句，则默认值为1。否则，默认值为0。

针对我们的例子，

GET twitter/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"city": "北京"
}
},
{
"match": {
"age": "30"
}
}
]
}
}
}
这个查询的是必须是北京城市的，并且年刚好是30岁的。

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.4823241,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.4823241,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.4823241,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
}
}
]
}
}
如果我们想知道为什么得出来这样的结果，我们可以在搜索的指令中加入"explained" : true。

GET twitter/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"city": "北京"
}
},
{
"match": {
"age": "30"
}
}
]
}
},
"explain": true
}
这样在我们的显示的结果中，可以看到一些一些解释：

我们的显示结果有2个。同样，我们可以把一些满足条件的排出在外，我们可以使用 must_not。

GET twitter/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"city": "北京"
}
}
]
}
}
}
我们想寻找不在北京的所有的文档：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.0,
"_source" : {
"user" : "虹桥-老吴",
"message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
"uid" : 7,
"age" : 90,
"city" : "上海",
"province" : "上海",
"country" : "中国",
"address" : "中国上海市闵行区",
"location" : {
"lat" : "31.175927",
"lon" : "121.383328"
}
}
}
]
}
}
我们显示的文档只有一个。他来自上海，其余的都北京的。

接下来，我们来尝试一下 should。它表述“或”的意思，也就是有就更好，没有就算了。比如：

GET twitter/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"age": "30"
}
}
],
"should": [
{
"match_phrase": {
"message": "Happy birthday"
}
}
]
}
}
}
这个搜寻的意思是，age 必须是30岁，但是如果文档里含有 “Hanppy birthday”，相关性会更高，那么搜索得到的结果会排在前面：

{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.641438,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.641438,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
}
}
]
}
}
在上面的结果中，我们可以看到：同样是年龄30岁的两个文档，第一个文档由于含有 “Happy birthday” 这个字符串在 message 里，所以它的结果是排在前面的，相关性更高。我们可以从它的 _score 中可以看出来。第二个文档里 age 是30，但是它的 message 里没有 “Happy birthday” 字样，但是它的结果还是有显示，只是得分比较低一些。

在使用上面的复合查询时，bool 请求通常是 must，must_not, should 及 filter 的一个或其中的几个一起组合形成的。我们必须注意的是：

查询类型对 hits 及 _score 的影响
Clause 影响 #hits 影响 _score
must Yes Yes
must_not Yes No
should No* Yes
filter Yes No
如上面的表格所示，should 只有在特殊的情况下才会影响 hits。在正常的情况下它不会影响搜索文档的个数。那么在哪些情况下会影响搜索的结果呢？这种情况就是针对只有 should 的搜索情况，也就是如果你在 bool query 里，不含有 must, must_not 及 filter 的情况下，一个或更多的 should 必须有一个匹配才会有结果，比如：

GET twitter/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"city": "北京"
}
},
{
"match": {
"city": "武汉"
}
}
]
}
}
}
上面的查询显示结果为：

"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 0.48232412,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.48232412,
"_source" : {
"user" : "双榆树-张三",
"message" : "今儿天气不错啊，出去转转去",
"uid" : 2,
"age" : 20,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.48232412,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
}
},
...
}
在这种情况下，should 是会影响查询的结果的。如果我们使用 minimum_should_match 为2，也就是：

GET twitter/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"city": "北京"
}
},
{
"match": {
"city": "武汉"
}
}
],
"minimum_should_match": 2
}
}
}
也就是上面的两个 should 都必须同时满足才能被搜索到。上面的查询结果为空，因为我们没有一个 city 同时是 “北京” 和 “武汉” 的。

位置查询
Elasticsearch 最厉害的是位置查询。这在很多的关系数据库里并没有。我们举一个简单的例子：

GET twitter/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "北京"
}
}
]
}
},
"post_filter": {
"geo_distance": {
"distance": "3km",
"location": {
"lat": 39.920086,
"lon": 116.454182
}
}
}
}
在这里，我们查找在地址栏里有“北京”，并且在以位置(116.454182, 39.920086)为中心的3公里以内的所有文档。

{
"took" : 58,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.48232412,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.48232412,
"_source" : {
"user" : "朝阳区-老王",
"message" : "Happy BirthDay My Friend!",
"uid" : 6,
"age" : 50,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
}
}
]
}
}
在我们的查询结果中只有一个文档满足要求。

下面，我们找出在5公里以内的所有位置信息，并按照远近大小进行排序：

GET twitter/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "北京"
}
}
]
}
},
"post_filter": {
"geo_distance": {
"distance": "5km",
"location": {
"lat": 39.920086,
"lon": 116.454182
}
}
},
"sort": [
{
"_geo_distance": {
"location": "39.920086,116.454182",
"order": "asc",
"unit": "km"
}
}
]
}
在这里，我们看到了使用 sort 来对我们的搜索的结果进行排序。按照升序排列。

{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"user" : "朝阳区-老王",
"message" : "Happy BirthDay My Friend!",
"uid" : 6,
"age" : 50,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
},
"sort" : [
1.1882901656104885
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
},
"sort" : [
3.9447355972239952
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
},
"sort" : [
4.837769064666224
]
}
]
}
}
我们可以看到有三个显示的结果。在 sort 里我们可以看到距离是越来越大啊。另外我们可以看出来，如果 _score 不是 sort 的field，那么在使用 sor t后，所有的结果的 _score 都变为null。如果排序的如果在上面的搜索也可以直接写作为：

GET twitter/_search
{
"query": {
"bool": {
"must": {
"match": {
"address": "北京"
}
},
"filter": {
"geo_distance": {
"distance": "5km",
"location": {
"lat": 39.920086,
"lon": 116.454182
}
}
}
}
},
"sort": [
{
"_geo_distance": {
"location": "39.920086,116.454182",
"order": "asc",
"unit": "km"
}
}
]
}
范围查询
在 ES 中，我们也可以进行范围查询。我们可以根据设定的范围来对数据进行查询：

GET twitter/_search
{
"query": {
"range": {
"age": {
"gte": 30,
"lte": 40
}
}
}
}
在这里，我们查询年龄介于30到40岁的文档：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"user" : "朝阳区-老贾",
"message" : "123,gogogo",
"uid" : 5,
"age" : 35,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
}
}
}
]
}
}
如上所示，我们找到了3个匹配的文档。同样地，我们也可以对它们进行排序：

GET twitter/_search
{
"query": {
"range": {
"age": {
"gte": 30,
"lte": 40
}
}
},"sort": [
{
"age": {
"order": "desc"
}
}
]
}
我们对整个搜索的结果按照降序进行排序。

Exists 查询
我们可以通过 exists 来查询一个字段是否存在。比如我们再增加一个文档：

PUT twitter/_doc/20
{
"user" : "王二",
"message" : "今儿天气不错啊，出去转转去",
"uid" : 20,
"age" : 40,
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
}
}
在这个文档里，我们的 city 这一个字段是不存在的，那么一下的这个搜索将不会返回上面的这个搜索。

GET twitter/_search
{
"query": {
"exists": {
"field": "city"
}
}
}
如果文档里只要 city 这个字段不为空，那么就会被返回。反之，如果一个文档里city这个字段是空的，那么就不会返回。

如果查询不含 city 这个字段的所有的文档，可以这样查询：

GET twitter/_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "city"
}
}
}
}
}
假如我们创建另外一个索引 twitter1，我们打入如下的命令：

PUT twitter10/_doc/1
{
"locale": null
}
然后，我们使用如下的命令来进行查询：

GET twitter10/_search
{
"query": {
"exists": {
"field": "locale"
}
}
}
上面查询的结果显示：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
也就是没有找到。

如果你想找到一个 missing 的字段，你可以使用如下的方法：

GET twitter10/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "locale"
}
}
]
}
}
}
上面的方法返回的数据是：

匹配短语
我们可以通过如下的方法来查找 happy birthday。

GET twitter/_search
{
"query": {
"match": {
"message": "happy birthday"
}
}
}
展示的结果：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.9936417,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.9936417,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.733287,
"_source" : {
"user" : "朝阳区-老王",
"message" : "Happy BirthDay My Friend!",
"uid" : 6,
"age" : 50,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.84768087,
"_source" : {
"user" : "虹桥-老吴",
"message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
"uid" : 7,
"age" : 90,
"city" : "上海",
"province" : "上海",
"country" : "中国",
"address" : "中国上海市闵行区",
"location" : {
"lat" : "31.175927",
"lon" : "121.383328"
}
}
}
]
}
}
在默认的情况下，这个匹配是“或”的关系，也就是找到文档里含有“Happy"或者“birthday”的文档。如果我们新增加一个文档：

PUT twitter/_doc/8
{
"user": "朝阳区-老王",
"message": "Happy",
"uid": 6,
"age": 50,
"city": "北京",
"province": "北京",
"country": "中国",
"address": "中国北京市朝阳区国贸",
"location": {
"lat": "39.918256",
"lon": "116.467910"
}
}
那么我们重新进行搜索，我们可以看到这个新增加的id为8的也会在搜索出的结果之列，虽然它只含有“Happy"在message里。

如果我们想得到“与”的关系，我们可以采用如下的办法：

GET twitter/_search
{
"query": {
"match": {
"message": {
"query": "happy birthday",
"operator": "and"
}
}
}
}
经过这样的修改，我们再也看不见那个id为8的文档了，这是因为我们必须在 message 中同时匹配 “happy” 及 “birthday” 这两个词。

我们还有一种方法，那就是：

GET twitter/_search
{
"query": {
"match": {
"message": {
"query": "happy birthday",
"minimum_should_match": 2
}
}
}
}
在这里，我们采用了 “minimum_should_match” 来表面至少有2个是匹配的才可以。

我们可以看到在搜索到的结果中，无论我们搜索的是大小写字母，在搜索的时候，我们都可以匹配到，并且在 message 中，happy birthday 这两个词的先后顺序也不是很重要。比如，我们把 id 为5的文档更改为：

PUT twitter/_doc/5
{
"user": "朝阳区-老王",
"message": "BirthDay My Friend Happy !",
"uid": 6,
"age": 50,
"city": "北京",
"province": "北京",
"country": "中国",
"address": "中国北京市朝阳区国贸",
"location": {
"lat": "39.918256",
"lon": "116.467910"
}
}
在这里，我们有意识地把 BirthDay 弄到 Happy 的前面。我们再次使用上面的查询看看是否找到 id 为5的文档。

显然，match 查询时时不用分先后顺序的。我们下面使用 match_phrase 来看看。

GET twitter/_search
{
"query": {
"match_phrase": {
"message": "Happy birthday"
}
},
"highlight": {
"fields": {
"message": {}
}
}
}
在这里，我们可以看到我们使用了match_phrase。它要求 Happy 必须是在 birthday 的前面。下面是搜寻的结果：

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.6363969,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.6363969,
"_source" : {
"user" : "东城区-李四",
"message" : "happy birthday!",
"uid" : 4,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
}
},
"highlight" : {
"message" : [
"<em>happy</em> <em>birthday</em>!"
]
}
}
]
}
}
假如我们把我们之前的那个 id 为5的文档修改为：

PUT twitter/_doc/5
{
"user": "朝阳区-老王",
"message": "Happy Good BirthDay My Friend!",
"uid": 6,
"age": 50,
"city": "北京",
"province": "北京",
"country": "中国",
"address": "中国北京市朝阳区国贸",
"location": {
"lat": "39.918256",
"lon": "116.467910"
}
}
在这里，我们在 Happy 和 Birthday之前加入了一个 Good。如果用我们之前的那个 match_phrase 是找不到这个文档的。为了能够找到上面这个修正的结果，我们可以使用：

GET twitter/_search
{
"query": {
"match_phrase": {
"message": {
"query": "Happy birthday",
"slop": 1
}
}
},
"highlight": {
"fields": {
"message": {}
}
}
}
注意：在这里，我们使用了 slop 为1，表面 Happy 和 birthday 之前是可以允许一个 token 的差别。

Named queries
我们可以使用 _name 为一个 filter 或 query 来取一个名字，比如：

GET twitter/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"city": {
"query": "北京",
"_name": "城市"
}
}
},
{
"match": {
"country": {
"query": "中国",
"_name": "国家"
}
}
}
],
"should": [
{
"match": {
"_id": {
"query": "1",
"_name": "ID"
}
}
}
]
}
}
}
返回结果：

"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.6305401,
"_source" : {
"user" : "双榆树-张三",
"message" : "今儿天气不错啊，出去转转去",
"uid" : 2,
"age" : 20,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
}
},
"matched_queries" : [
"国家",
"ID",
"城市"
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6305401,
"_source" : {
"user" : "东城区-老刘",
"message" : "出发，下一站云南！",
"uid" : 3,
"age" : 30,
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
}
},
"matched_queries" : [
"国家",
"城市"
]
},
...
]
我们从上面的返回结果可以看出来多了一个叫做 matched_queries 的字段。在它的里面罗列了每个匹配了的查询。第一个返回的查询结果是三个都匹配了的，但是第二个来说就只有两项是匹配的。

通配符查询
我们可以使用 wildcard 查询一个字符串里含有的字符：

GET twitter/_search
{
"query": {
"wildcard": {
"city.keyword": {
"value": "*海"
}
}
}
}
上面查询在 city 这个关键字中含有“海”的文档。上面的搜寻结果是：

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"user" : "虹桥-老吴",
"message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
"uid" : 7,
"age" : 90,
"city" : "上海",
"province" : "上海",
"country" : "中国",
"address" : "中国上海市闵行区",
"location" : {
"lat" : "31.175927",
"lon" : "121.383328"
}
}
}
]
}
}
我们可以看到查到 city 为 “上海” 的文档。

Disjunction max 查询
返回与一个或多个包在一起的查询（称为查询子句或子句）匹配的文档。

如果返回的文档与多个查询子句匹配，则 dis_max 查询为该文档分配来自任何匹配子句的最高相关性得分，并为任何其他匹配子查询分配平局打破增量。

你可以使用 dis_max 在以不同 boost 因子映射的字段中搜索术语。比如：

GET twitter/_search
{
"query": {
"dis_max": {
"queries": [
{
"term": {
"city.keyword": "北京"
}
},
{
"match": {
"address": "北京"
}
}
],
"tie_breaker": 0.7
}
}
}
在上面的 dis_max 查询中，它将返回任何一个在 queries 中所定的查询的文档。每个匹配分分数是按照如下的规则来进行计算的：

如果一个文档匹配其中的一个或多个查询，那么最终的得分将以其中最高的那个得分来进行计算
在默认的情况下，tie_breaker 的值为0。它可以是 0 到 1.0 之间的数
如果文档匹配多个子句，则 dis_max 查询将计算该文档的相关性得分，如下所示：

从具有最高分数的匹配子句中获取相关性分数。
将来自其他任何匹配子句的得分乘以 tie_breaker 值。
将最高分数加到相乘的分数上。
如果 tie_breaker 值大于0.0，则所有匹配子句均计数，但得分最高的子句计数最高。

SQL 查询

对于与很多已经习惯用 RDMS 数据库的工作人员，他们更喜欢使用 SQL 来进行查询。Elasticsearch 也对 SQL 有支持：

GET /_sql?
{
"query": """
SELECT * FROM twitter
WHERE age = 30
"""
}
通过这个查询，我们可以找到所有在年龄等于30的用户。在个搜索中，我们使用了 SQL 语句。利用 SQL 端点我们可以很快地把我们的 SQL 知识转化为 Elasticsearch 的使用场景中来。我们可以通过如下的方法得到它对应的 DSL 语句：

GET /_sql/translate
{
"query": """
SELECT * FROM twitter
WHERE age = 30
"""
}
我们得到的结果是：

{
"size" : 1000,
"query" : {
"term" : {
"age" : {
"value" : 30,
"boost" : 1.0
}
}
},
"_source" : {
"includes" : [
"address",
"message",
"region",
"script.source",
"user"
],
"excludes" : [ ]
},
"docvalue_fields" : [
{
"field" : "age"
},
{
"field" : "city"
},
{
"field" : "country"
},
{
"field" : "location"
},
{
"field" : "province"
},
{
"field" : "script.params.value"
},
{
"field" : "uid"
}
],
"sort" : [
{
"_doc" : {
"order" : "asc"
}
}
]
}
如果你想了解更多关于Elasticsearch EQL，请参阅我的另外一篇文章 “Elasticsearch SQL介绍及实例”。

Multi Search API
使用单个 API 请求执行几次搜索。这个 API 的好处是节省 API 的请求个数，把多个请求放到一个 API 请求中来实现。

为了说明问题的方便，我们可以多加一个叫做 twitter1 的 index。它的内容如下：

POST _bulk
{"index":{"_index":"twitter1","_id":1}}
{"user":"张庆","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"重庆","province":"重庆","country":"中国","address":"中国重庆地区","location":{"lat":"39.970718","lon":"116.325747"}}
这样在我们的 Elasticsearch 中就有两个索引了。我们可以做如下的 _msearch。

GET twitter/_msearch
{"index":"twitter"}
{"query":{"match_all":{}},"from":0,"size":1}
{"index":"twitter"}
{"query":{"bool":{"filter":{"term":{"city.keyword":"北京"}}}}, "size":1}
{"index":"twitter1"}
{"query":{"match_all":{}}}
上面我们通过 _msearch 终点来实现在一个 API 请求中做多个查询，对多个 index 进行同时操作。显示结果为：

多个索引操作
在上面我们引入了另外一个索引 twitter1。在实际的操作中，我们可以通过通配符，或者直接使用多个索引来进行搜索：

GET twitter*/_search
上面的操作是对所有的以 twitter 为开头的索引来进行搜索，显示的结果是在所有的 twitter 及 twitter1 中的文档：

GET /twitter,twitter1/_search
也可以做同样的事。在写上面的查询的时候，在两个索引之间不能加入空格，比如：

GET /twitter, twitter1/_search
上面的查询并不能返回你所想要的结果。

Profile API
Profile API 是调试工具。它添加了有关执行的详细信息搜索请求中的每个组件。它为用户提供有关搜索的每个步骤的洞察力
请求执行并可以帮助确定某些请求为何缓慢。

GET twitter/_search
{
"profile": "true",
"query": {
"match": {
"city": "北京"
}
}
}
在上面，我们加上了 "profile":"true" 后，除了显示搜索的结果之外，还显示 profile 的信息：

"profile" : {
"shards" : [
{
"id" : "[ZXGhn-90SISq1lePV3c1sA][twitter][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "city:北 city:京",
"time_in_nanos" : 1390064,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 5,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 31728,
"match" : 3337,
"next_doc_count" : 5,
"score_count" : 5,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 22347,
"advance_count" : 1,
"score" : 16639,
"build_scorer_count" : 2,
"create_weight" : 342219,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 973775
},
"children" : [
{
"type" : "TermQuery",
"description" : "city:北",
"time_in_nanos" : 107949,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 5,
"compute_max_score_count" : 3,
"compute_max_score" : 11465,
"advance" : 3477,
"advance_count" : 6,
"score" : 5793,
"build_scorer_count" : 3,
"create_weight" : 34781,
"shallow_advance" : 18176,
"create_weight_count" : 1,
"build_scorer" : 34236
}
},
{
"type" : "TermQuery",
"description" : "city:京",
"time_in_nanos" : 49929,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 5,
"compute_max_score_count" : 3,
"compute_max_score" : 5162,
"advance" : 15645,
"advance_count" : 6,
"score" : 3795,
"build_scorer_count" : 3,
"create_weight" : 13562,
"shallow_advance" : 1087,
"create_weight_count" : 1,
"build_scorer" : 10657
}
}
]
}
],
"rewrite_time" : 17930,
"collector" : [
{
"name" : "CancellableCollector",
"reason" : "search_cancelled",
"time_in_nanos" : 204082,
"children" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 23347
}
]
}
]
}
],
"aggregations" : [ ]
}
]
}
从上面我们可以看出来，这个搜索是搜索了“北”及“京”，而不是把北京作为一个整体来进行搜索的。我们可以在以后的文档中可以学习使用中文分词器来进行分词搜索。有兴趣的同学可以把上面的搜索修改为 city.keyword 来看看。如果你对分词感兴趣的话，请参阅我的文章 “Elastic：菜鸟上手指南” 中的分词器部分。

除了上面的通过命令来进行 profile 以外，我们也可以通过 Kibana 的 UI 对我们的搜索进行 profile：

在很多的时候这个可视化的工具更具直观性。
原文链接：Elastic 中国社区官方博客

ES elasticsearch 各种查询

相关文章

sql server 碎片整理——DBCC SHOWCONTIG

UniAPP IAP支付流程

手把手写一个html_json信息源

Nginx使用Lua脚本加解密RSA字符串

【转载】mac os常用软件

Requests方法 -- Token获取操作

最新文章

随机推荐

思享工具箱导航

JSON工具

格式化转换

加解密编码

文本数字

网络

站长

计算

其他

对照列表