elasticsearch自动补全详解

摘要:
1、 请参阅SuggestersElasticsearchSuggester Details II。基本介绍。2.1示例2.2建议过程III.ES建议3.1实现原则将输入文本分解为标记,然后在索引字典中查找类似的术语,并返回3.24种建议类型(1)术语建议者(2)短语建议者(3)完成建议者(4)续

elasticsearch自动补全详解第1张

一、参考

Suggesters

Elasticsearch Suggester 详解

二、基本介绍

2.1 bing 示例

elasticsearch自动补全详解第2张

2.2 suggest 过程

elasticsearch自动补全详解第3张

三、ES 的 suggester

3.1 实现原理

将输入的文本分解为token, 然后在索引的字典中查找相似的 term 并且返回

3.2 4 种 suggester

(1) term suggester

(2) phrase suggester

(3) completion suggester

(4) context suggester

四、term suggester

(1) 创建索引,写入文档

# 创建索引
PUT yztest/
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text"
      }
    }
  }
}

# 添加文档1
POST yztest/_doc/1
{
  "message": "The goal of Apache Lucene is to provide world class search capabilities"
}

# 添加文档2
POST yztest/_doc/2
{
  "message": "Lucene is the search core of both Apache Solr and Elasticsearch."
}

(2) 查看分词 token


# 分析分词器结果

GET yztest/_analyze
{
  "field": "message",
  "text": [
    "The goal of Apache Lucene is to provide world class search capabilities",
    "Lucene is the search core of both Apache Solr and Elasticsearch."
  ]
}

(3) 不同的查询结果

elasticsearch自动补全详解第4张

a) 当输入单词拼写错误时候,会推荐正确的拼写单词列表

# 查询
POST yztest/_search
{
  "suggest": {
    "suggest_message": { # 自定义的suggester名称
      "text": "lucenl",  # 查询的字符串,即用户输入的内容
      "term": { # suggester类型为term suggester
        "field": "message", # 待匹配字段
        "suggest_mode": "missing" # 推荐结果模式,missing表示如果存在了term和用户输入的文本相同,则不再推荐
      }
    }
  }
}

# 返回结果
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "suggest_message" : [
      {
        "text" : "lucenl",
        "offset" : 0,
        "length" : 6,
        "options" : [ # options为一个数组,里面的值为具体的推荐值
          {
            "text" : "lucene",
            "score" : 0.8333333,
            "freq" : 2
          }
        ]
      }
    ]
  }
}

b) 当输入为多个单词组成的字符串时

# 查询
POST yztest/_search
{
  "suggest": {
    "suggest_message": {
      "text": "lucene search",
      "term": {
        "field": "message",
        "suggest_mode": "always"
      }
    }
  }
}

# 查询结果
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "suggest_message" : [
      {
        "text" : "lucene",
        "offset" : 0,
        "length" : 6,
        "options" : [ ]
      },
      {
        "text" : "search",
        "offset" : 7,
        "length" : 6,
        "options" : [ ]
      }
    ]
  }
}


五、phrase suggester

# 词组查询
POST yztest/_search
{
  "suggest": {
    "YOUR_SUGGESTION": {
      "text": "Solr and Elasticearc", # 用户输入的字符串
      "phrase": { # 指定suggest类型为phrase suggester
        "field": "message", # 待匹配的字段
        "highlight": { # 可以设置高亮
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

# 返回结果
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "YOUR_SUGGESTION" : [
      {
        "text" : "Solr and Elasticearc",
        "offset" : 0,
        "length" : 20,
        "options" : [
          {
            "text" : "solr and elasticsearch",
            "highlighted" : "solr and <em>elasticsearch</em>", # 高亮部分
            "score" : 0.017689342
          }
        ]
      }
    ]
  }
}


elasticsearch自动补全详解第5张

六、completion suggester

自动补全功能

6.1 创建 mapping 指定 suggest 字段

# 创建索引
PUT yztest/
{
  "mappings": {
    "properties": {
      "message": { # 通过字段的type,指定是否使用suggest
        "type": "completion"
      }
    }
  }
}

6.2 查询

(1) 索引文档

POST yztest/_doc/1
{
  "message": "The goal of Apache Lucene is to provide world class search capabilities"
}

POST yztest/_doc/2
{
  "message": "Lucene is the search core of both Apache Solr and Elasticsearch."
}

POST yztest/_doc/3
{
  "message": "Lucene is the search core of Elasticsearch."
}

POST yztest/_doc/4
{
  "message": "Lucene is the search core of Apache Solr."
}

(2) 前缀查询

# 查询
POST yztest/_search
{
  "suggest": {
    "message_suggest": { # 自定义suggester名称
      "prefix": "lucene is the", # 前缀字符串,即用户输入的文本
      "completion": { # 指定suggester的类型为 completion suggester
        "field": "message" # 待匹配的字段
      }
    }
  }
}

# 查询结果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "message_suggest" : [
      {
        "text" : "lucene is the",
        "offset" : 0,
        "length" : 13,
        "options" : [
          {
            "text" : "Lucene is the search core of Apache Solr.",
            "_index" : "yztest",
            "_type" : "_doc",
            "_id" : "4",
            "_score" : 1.0,
            "_source" : {
              "message" : "Lucene is the search core of Apache Solr."
            }
          },
          {
            "text" : "Lucene is the search core of Elasticsearch.",
            "_index" : "yztest",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 1.0,
            "_source" : {
              "message" : "Lucene is the search core of Elasticsearch."
            }
          },
          {
            "text" : "Lucene is the search core of both Apache Solr and ",
            "_index" : "yztest",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "message" : "Lucene is the search core of both Apache Solr and Elasticsearch."
            }
          }
        ]
      }
    ]
  }
}


(3) skip_duplicates

删除重复匹配文档

# 查询中指定skip_duplicates, 默认值为false
POST yztest/_search
{
  "suggest": {
    "message_suggest": {
      "prefix": "lucene is the",
      "completion": {
        "field": "message",
        "skip_duplicates": true
      }
    }
  }
}

(4) fuzzy query

# 查询中指定fuzzy属性,即不一定是prefix准确查询
POST yztest/_search
{
  "suggest": {
    "message_suggest": {
      "prefix": "lucen is the",
      "completion": {
        "field": "message",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

(5) regex 查询,正则匹配

# 正则匹配
POST yztest/_search
{
  "suggest": {
    "message_suggest": {
      "regex": ".*solr.*", # 正则表达式
      "completion": {
        "field": "message"
      }
    }
  }
}

七、context suggester

八、如何实现?

elasticsearch自动补全详解第6张

免责声明:文章转载自《elasticsearch自动补全详解》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇Java基础(十五):Java 中的内部类Spark(十二)SparkSQL简单使用下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

AFN Post方法 设置请求头(json)

设置请求头:[manager.requestSerializer setValue:@"application/json" forHTTPHeaderField:@"Content-Type"]; GET方法设置请求头的时候没什么问题;但是使用POST方法时设置无效,使用其他方法都没用,后来发现没有序列化 //AFJSONRequestSerializer...

ElasticSearch索引与搜索

在系列的第一篇文章中我们介绍了ElasticSearch的基本概念和操作,本文将继续介绍ElasticSearch查询和索引功能。 目录: 查询 精确查询 term查询 terms查询 range查询 全文查询 match查询 multi_match查询 script查询 组合查询 bool查询 dis_max查询 function_sc...

基于IKAnalyzer搭建分词服务

背景 前端高亮需要分词服务,nlp团队提供的分词服务需要跨域调用,而且后台数据索引使用的IK分词。综合评价,前端分词也需要基于IK分词器。IKAnalyzer服务已经停止更新,且对Lucene支持仅测试到4.x.x版本(6.x.x会出现异常),因此使用IK分词器时需要解决一些异常。 依赖 项目以及maven构建,需要指定IK依赖以及Lucene依赖如下:...

ElasticSearch 安装与配置 (windows)

新工作需要所以开始学一下。。。 elasticsearch的概念和基本原理: elasticsearch是一个类似于nosql数据库的东西,基于Lucene,以json格式储存数据,采用了倒序索引(也叫反向索引),主要用于信息抓取(我的感觉就是字符串查找),所以主要应用在搜索引擎和自然语言处理方面。 比如有30篇文章,我想找一个字符串出现的位置,普通情况我...

php对xml文件的解析

近来较少写博客了,得克服懒惰的秉性啊! 今天研究了一下php对xml文件的解析。 用到了php的simplexml_load_file()方法,该方法会将xml文件生成一个SimpleXMLElement对象,该对象是继承了Traversable接口的对象,即可以像数组那样遍历其子集。 这样,我们就可以循环得到xml文件的内容,不多说废话,上例子。 假设有...

xios封装

封装的意义 1.提高代码可读性2.提高代码可维护性3.减少代码书写 封装 import axios from 'axios' axios.defaults.baseURL = 'http://127.0.0.1:8000' // 全局设置网络超时 axios.defaults.timeout = 10000; //设置请求头信息 axios.defau...