search(14)- elastic4s-统计范围:global, filter,post-filter bucket

摘要:
聚合通常在查询范围内工作。例如,在计算某辆车的平均售价时,需要知道所有汽车的平均售价。在这里,所有汽车的价格平价是一种全球巴克统计:GET/cartxns/_搜索{“query”:{“match”:{“make.keyword”:“ford”}},“aggs”:{{“avg_ford”:}“avg”:{“field”:“price”},“aggs”:{“avg_price”:{”avg“:{”field“:”price“}}}}}搜索结果和聚合结果如下:“hits”:“{”to“}{}”tal“:”value“:2,”relationship“:”eq“},”max_score“:1.2809337,”hits“:[{”_index“:”cartxns“,”_type“:”_doc“,”_id“:”NGVXAnIBSDa1Wo5UqLc3“,”_score”:1.28093 37,“_source”:{“price”:30000,“color”:“green”,“make”:“ford”,“seld”:“2014-05-18”}},{“_index”:“cartxns”,“_type”:_doc”,“_id”:“OWVYAnIBSDa1Wo5UTrf8”,“_score”:1.2809337,“_source”:{“price”:25000,“color”:“blue”,“make”:“ford”,“seld”:“2014-02-12”}}]},“Aggregations”:{“avg_all”:{“doc_count”:8,“avg_price”:{”“value”:26500.0},”avg_ford“:{”value“27500.0”}results.Aggregations.avgprintlglobResult result.hits.foreach}elseprintln…POST:/cartxns/_搜索?

聚合一般作用在query范围内。不带query的aggregation请求实际上是在match_all{}查询范围内进行统计的:

GET /cartxns/_search
{
  "aggs": {
    "all_colors": {
      "terms": {"field" : "color.keyword"}
    }
  }
 }
}

GET /cartxns/_search
{
  "query": {
    "match_all": {}
  }, 
  "aggs": {
    "all_colors": {
      "terms": {"field" : "color.keyword"}
    }
  }
 }
}

上面这两个请求结果相同:

"aggregations": {
    "all_colors": {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets": [
        {
          "key" : "red",
          "doc_count" : 4},
        {
          "key" : "blue",
          "doc_count" : 2},
        {
          "key" : "green",
          "doc_count" : 2}
      ]
    }
  }

虽然很多时候我们都希望在query作用域下进行统计,但也会碰到需要统计不含任何query条件的汇总数。比如在统计某个车款平价售价的同时又需要知道全部车款的平均售价。这里全部车款平价售价就是一种global bucket统计:

GET /cartxns/_search
{
  "query": {
    "match" : {"make.keyword": "ford"}
  }
  , "aggs": {
    "avg_ford": {
      "avg": {
        "field": "price"}
    },
    "avg_all": {
      "global": {},
      "aggs": {
        "avg_price": {
          "avg": {"field": "price"}
        }
      }
    }
    
  }

}

搜索结果和聚合结果如下:

"hits": {
    "total": {
      "value" : 2,
      "relation" : "eq"},
    "max_score" : 1.2809337,
    "hits": [
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "NGVXAnIBSDa1Wo5UqLc3",
        "_score" : 1.2809337,
        "_source": {
          "price" : 30000,
          "color" : "green",
          "make" : "ford",
          "sold" : "2014-05-18"}
      },
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "OWVYAnIBSDa1Wo5UTrf8",
        "_score" : 1.2809337,
        "_source": {
          "price" : 25000,
          "color" : "blue",
          "make" : "ford",
          "sold" : "2014-02-12"}
      }
    ]
  },
  "aggregations": {
    "avg_all": {
      "doc_count" : 8,
      "avg_price": {
        "value" : 26500.0}
    },
    "avg_ford": {
      "value" : 27500.0}
  }

用elastic4s来表达:

val aggGlob = search("cartxns").query(
    matchQuery("make.keyword","ford")
  ).aggregations(
    avgAggregation("single_avg").field("price"),
    globalAggregation("all_avg").subaggs(
        avgAggregation("avg_price").field("price")
    )
  )
  println(aggGlob.show)

  val globResult = client.execute(aggGlob).await

  if(globResult.isSuccess) {
    val gavg = globResult.result.aggregations.global("all_avg").avg("avg_price")
    val savg = globResult.result.aggregations.avg("single_avg")
    println(s"${savg.value},${gavg.value}")
    globResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
  } else println(s"error: ${globResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?StringEntity({"query":{"match":{"make.keyword":{"query":"ford"}}},"aggs":{"single_avg":{"avg":{"field":"price"}},"all_avg":{"global":{},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}},Some(application/json))
27500.0,26500.0Map(price -> 30000, color -> green, make -> ford, sold -> 2014-05-18)
Map(price -> 25000, color -> blue, make -> ford, sold -> 2014-02-12)

filter-bucket的作用是:在query结果内再进行筛选后统计。比如:查询所有honda车款交易,但只统计honda某个月销售:

GET /cartxns/_search
{
    "query": {
      "match": {
        "make.keyword": "honda"}
    },
    "aggs": {
      "sales_this_month": {
        "filter": {
          "range" : {"sold" : { "from" : "2014-10-01", "to" : "2014-11-01"}}
        },
        "aggs": {
          "month_total": {
            "sum": {"field": "price"}
          }
        }
      }
    }
}

首先,查询结果应该不受影响。同时还得到查询结果车款某个月的销售额:

"hits": {
    "total": {
      "value" : 3,
      "relation" : "eq"},
    "max_score" : 0.9444616,
    "hits": [
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "MmVXAnIBSDa1Wo5UqLc3",
        "_score" : 0.9444616,
        "_source": {
          "price" : 10000,
          "color" : "red",
          "make" : "honda",
          "sold" : "2014-10-28"}
      },
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "M2VXAnIBSDa1Wo5UqLc3",
        "_score" : 0.9444616,
        "_source": {
          "price" : 20000,
          "color" : "red",
          "make" : "honda",
          "sold" : "2014-11-05"}
      },
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "N2VXAnIBSDa1Wo5UqLc3",
        "_score" : 0.9444616,
        "_source": {
          "price" : 20000,
          "color" : "red",
          "make" : "honda",
          "sold" : "2014-11-05"}
      }
    ]
  },
  "aggregations": {
    "sales_this_month": {
      "doc_count" : 1,
      "month_total": {
        "value" : 10000.0}
    }
  }

elastic4s示范如下:

val aggfilter = search("cartxns").query(
    matchQuery("make.keyword","honda")
  ).aggregations(
    filterAgg("sales_the_month",rangeQuery("sold").gte("2014-10-01").lte("2014-11-01"))
    .subaggs(sumAggregation("monthly_sales").field("price"))
  )
  println(aggfilter.show)

  val filterResult = client.execute(aggfilter).await

  if(filterResult.isSuccess) {
    val ms = filterResult.result.aggregations.filter("sales_the_month")
              .sum("monthly_sales").value
    println(s"${ms}")
    filterResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
  } else println(s"error: ${filterResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?StringEntity({"query":{"match":{"make.keyword":{"query":"honda"}}},"aggs":{"sales_the_month":{"filter":{"range":{"sold":{"gte":"2014-10-01","lte":"2014-11-01"}}},"aggs":{"monthly_sales":{"sum":{"field":"price"}}}}}},Some(application/json))
10000.0Map(price -> 10000, color -> red, make -> honda, sold -> 2014-10-28)
Map(price -> 20000, color -> red, make -> honda, sold -> 2014-11-05)
Map(price -> 20000, color -> red, make -> honda, sold -> 2014-11-05)

最后一个是post-filter。post-filter同样是对query结果的筛选,但是在完成了整个query后对结果的筛选。也就是说如果query还涉及到聚合,那么聚合不受筛选影响:

GET /cartxns/_search
{
  "query": {
    "match": {
      "make.keyword": "ford"}
  },
  "post_filter": {
    "match": {
      "color.keyword" : "blue"}
  }
  ,"aggs": {
    "colors": {
      "terms": {
        "field": "color.keyword",
        "size": 10}
    }
  }
}

查询和聚合结果如下:

"hits": {
    "total": {
      "value" : 1,
      "relation" : "eq"},
    "max_score" : 1.2809337,
    "hits": [
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "OWVYAnIBSDa1Wo5UTrf8",
        "_score" : 1.2809337,
        "_source": {
          "price" : 25000,
          "color" : "blue",
          "make" : "ford",
          "sold" : "2014-02-12"}
      }
    ]
  },
  "aggregations": {
    "colors": {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets": [
        {
          "key" : "blue",
          "doc_count" : 1},
        {
          "key" : "green",
          "doc_count" : 1}
      ]
    }
  }
}

可以看到:查询结果显示了经过post-filter筛选的结果,但聚合并没有受到filter影响。

elastic4s示范代码:

val aggPost = search("cartxns").query(
    matchQuery("make.keyword","ford")
  ).postFilter(matchQuery("color.keyword","blue"))
      .aggregations(
        termsAgg("colors","color.keyword")
      )

  println(aggPost.show)

  val postResult = client.execute(aggPost).await

  if(postResult.isSuccess) {
    postResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
    postResult.result.aggregations.terms("colors").buckets
      .foreach(b => println(s"${b.key},${b.docCount}"))
  } else println(s"error: ${postResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?StringEntity({"query":{"match":{"make.keyword":{"query":"ford"}}},"post_filter":{"match":{"color.keyword":{"query":"blue"}}},"aggs":{"colors":{"terms":{"field":"color.keyword"}}}},Some(application/json))
Map(price -> 25000, color -> blue, make -> ford, sold -> 2014-02-12)
blue,1green,1

免责声明:文章转载自《search(14)- elastic4s-统计范围:global, filter,post-filter bucket》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇Maven 引用本地jar包启动正常 运行时报错CodeFactory VS2008插件使用简介下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

Cygwin apt-cyg ”MD5 sum did not match”

直接编辑apt-cyg 文件,找到md5sum,替换修改成sha512sum。 # check the md5 digest=`cat "desc" | awk '/^install: / { print $4; exit }'` digactual=`sha512sum $file | awk '{print $1}'` if !...

安卓(TableLayout)

1、特点 公共类TableLayout扩展LinearLayout容器不显示其行、列或单元格的边框线。每一行有零个或多个单元格;每个单元格可以包含一个视图对象该表的列数与包含最多单元格的行的列数相同。一个表可以保留空单元格。单元格可以跨列,就像在HTML中一样。 列的宽度由该列中单元格最宽的行定义。但是,TableLayout可以通过调用setColumn...

Android蓝牙开发技术学习总结

Android开发,提供对蓝牙的通讯栈的支持,允许设别和其他的设备进行无线传输数据。应用程序层通过安卓API来调用蓝牙的相关功能,这些API使程序无线连接到蓝牙设备,并拥有P2P或者多端无线连接的特性。 蓝牙的功能: 1、扫描其他蓝牙设备 2、为可配对的蓝牙设备查询蓝牙适配器 3、建立RFCOMM通道(其实就是尼玛的认证) 4、通过服务搜索来链接其他的设备...

mysql状态查看 QPS/TPS/缓存命中率查看

运行中的mysql状态查看   对正在运行的mysql进行监控,其中一个方式就是查看mysql运行状态。    (1)QPS(每秒Query量)  QPS = Questions(or Queries) / seconds  mysql > show  global  status like 'Question%';    (2)TPS(每秒事务量)...

天气预报APP(2)

之前实现了能够罗列可以罗列出全国所有的省、市、县,然后就是查询全国任意城市的天气信息。查询天气信息使用的是和风天气的api,这个api获得的天气信息是JSON格式的。 使用GSON库解析JSON数据的第一步要先定义用于映照的类。 我对官方实例做了一下删减,然后可以看到这次我想要展示在这次这个项目中的信息: { "HeWeather5": [...

Elastic search 基本使用

1. elasticsearch 命令的基本格式 RESTful接口URL的格式: http://localhost:9200/<index>/<type>/[<id>] 其中index、type是必须提供的。id是可选的,不提供es会自动生成。index、type将信息进行分层,利于管理。index可以理解为数据库;t...