当前位置: 首页 > wzjs >正文

企业建站费用情况校园网站建设策划书

企业建站费用情况,校园网站建设策划书,花生壳怎么发布自己做的网站,ae模板下载网站安装拼音分词器 选择es版本对应的pinyin分词器版本 下载后解压,放到es的插件目录下 重启es 自定义分词器 拼音分词器——可选配置 1. 首字母处理配置 keep_first_letter (默认: true) 解释:是否提取每个汉字的首字母组合,用于支持首字母…

安装拼音分词器

选择es版本对应的pinyin分词器版本

下载后解压,放到es的插件目录下

重启es


自定义分词器

拼音分词器——可选配置

1. 首字母处理配置

keep_first_letter (默认: true)

解释:是否提取每个汉字的首字母组合,用于支持首字母缩写搜索
开启时刘德华[ldh]
关闭时刘德华[](不生成首字母)
应用场景:适用于"ldh"搜索"刘德华"的需求

keep_separate_first_letter (默认: false)

解释:是否将每个汉字的首字母分开存储
开启时刘德华[l,d,h]
关闭时刘德华[ldh]
注意:开启会增加索引体积,但能支持更灵活的搜索(如"l d h")

limit_first_letter_length (默认: 16)

解释:限制首字母结果的最大长度
示例
中华人民共和国 → 默认输出[zhrmghg](7字符)
设置为3时 → [zhr]
用途:控制长文本的首字母结果长度

2. 完整拼音处理

keep_full_pinyin (默认: true)

解释:是否保留每个汉字的完整拼音
开启时刘德华[liu,de,hua]
关闭时刘德华[]
必要性:支持拼音精确搜索的基础配置

keep_joined_full_pinyin (默认: false)

解释:是否将完整拼音连接成一个词
开启时刘德华[liudehua]
关闭时刘德华[liu,de,hua]
优劣:连接后减少索引词项,但会丢失单字搜索能力

3. 非中文处理配置

keep_none_chinese (默认: true)

解释:是否保留原始文本中的非中文字符
开启时刘德华AT2016[liu,de,hua,AT2016]
关闭时刘德华AT2016[liu,de,hua]
重要性:处理混合文本的关键参数

keep_none_chinese_together (默认: true)

解释:是否保持非中文连续字符的完整性
开启时DJ音乐家[DJ,yin,yue,jia]
关闭时DJ音乐家[D,J,yin,yue,jia]
影响:关闭后会显著增加索引词项数量

4. 高级处理配置

none_chinese_pinyin_tokenize (默认: true)

解释:是否将非中文按拼音规则拆分
开启时liudehua2016[liu,de,hua,2,0,1,6]
关闭时liudehua2016[liudehua2016]
特殊用途:处理拼音与数字混合的情况

remove_duplicated_term (默认: false)

解释:是否去除重复的词项
开启时de的[de]
关闭时de的[de,的]
权衡:节省30-50%索引空间,但影响高亮精度

keep_original (默认:false)

解释:是否保留原始的文本
开启时"北京"["北京", "beijing", "bj"]
关闭时"北京"["beijing", "bj"]

5. 系统行为配置

ignore_pinyin_offset (默认: true)

解释:是否忽略拼音分词的位置偏移
开启时:允许重叠分词(节省资源)
关闭时:严格位置约束(保证高亮准确)
版本注意:Elasticsearch 6.0+必须关注此参数


自定义分词器的工作原理

elasticsearch中分词器(analyzer)的组成包含三部分:

  • character filter:在tokenizer之前对文本进行处理。例如删除字符、替换字符
  • tokenizer:将文本按照一定的规则切割成词条(term)。例如keyword,就是不分词;还有ik_smart
  • tokenizer filter:将tokenizer输出的词条做进一步处理。例如大小写转换、同义词处理、拼音处理等


案例

新建用于测试自定义分词器的索引库test

PUT /test
{"settings": {"analysis": {"analyzer": {"my_analyzer": {"tokenizer": "ik_max_word","filter": "py"  }},"filter": { "py": {"type": "pinyin","keep_full_pinyin": false, # 不保留每个汉字的完整拼音"keep_joined_full_pinyin": true, # 把完整的拼音连成一个长拼音"keep_original": true, # 保留原始的文本"limit_first_letter_length": 16, # 限制首字母的最大长度为16"remove_duplicated_term": true,  # 去除重复的选项"none_chinese_pinyin_tokenize": false  # 不将非中文按拼音规则拆分}}}},"mappings": {"properties": {"words": {"type": "text","analyzer": "my_analyzer","search_analyzer": "ik_max_word"}}}
}

创建倒排索引的时候使用 my_analyzer

查询的时候指定分词器为 ik_max_word

这样就不会出现查询"狮子"的时候,出现虱子有关的词条了


测试

POST /test/_analyze
{"text": ["了却君王天下事junwang天下事"],"analyzer": "my_analyzer"
}
{"tokens" : [{"token" : "了却","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "leque","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "lq","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "君王","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 1},{"token" : "junwang","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 1},{"token" : "jw","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 1},{"token" : "天下事","start_offset" : 4,"end_offset" : 7,"type" : "CN_WORD","position" : 2},{"token" : "tianxiashi","start_offset" : 4,"end_offset" : 7,"type" : "CN_WORD","position" : 2},{"token" : "txs","start_offset" : 4,"end_offset" : 7,"type" : "CN_WORD","position" : 2},{"token" : "天下","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 3},{"token" : "tianxia","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 3},{"token" : "tx","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 3},{"token" : "事","start_offset" : 6,"end_offset" : 7,"type" : "CN_CHAR","position" : 4},{"token" : "shi","start_offset" : 6,"end_offset" : 7,"type" : "CN_CHAR","position" : 4},{"token" : "s","start_offset" : 6,"end_offset" : 7,"type" : "CN_CHAR","position" : 4},{"token" : "junwang","start_offset" : 7,"end_offset" : 14,"type" : "ENGLISH","position" : 5},{"token" : "天下事","start_offset" : 14,"end_offset" : 17,"type" : "CN_WORD","position" : 6},{"token" : "tianxiashi","start_offset" : 14,"end_offset" : 17,"type" : "CN_WORD","position" : 6},{"token" : "txs","start_offset" : 14,"end_offset" : 17,"type" : "CN_WORD","position" : 6},{"token" : "天下","start_offset" : 14,"end_offset" : 16,"type" : "CN_WORD","position" : 7},{"token" : "tianxia","start_offset" : 14,"end_offset" : 16,"type" : "CN_WORD","position" : 7},{"token" : "tx","start_offset" : 14,"end_offset" : 16,"type" : "CN_WORD","position" : 7},{"token" : "事","start_offset" : 16,"end_offset" : 17,"type" : "CN_CHAR","position" : 8},{"token" : "shi","start_offset" : 16,"end_offset" : 17,"type" : "CN_CHAR","position" : 8},{"token" : "s","start_offset" : 16,"end_offset" : 17,"type" : "CN_CHAR","position" : 8}]
}

PUT /test/_doc/1
{"words":"身上有虱子"
}PUT /test/_doc/2
{"words":"山里有狮子"
}

执行DSL

GET /test/_search
{"query": {"match": {"words": "虱子"}}
}

指定search_analyzer为ik_max_word前的结果

{"took" : 6,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.33425623,"hits" : [{"_index" : "test","_type" : "_doc","_id" : "1","_score" : 0.33425623,"_source" : {"words" : "身上有虱子"}},{"_index" : "test","_type" : "_doc","_id" : "2","_score" : 0.3085442,"_source" : {"words" : "山里有狮子"}}]}
}

指定search_analyzer为ik_max_word后的结果

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 0.9530773,"hits" : [{"_index" : "test","_type" : "_doc","_id" : "1","_score" : 0.9530773,"_source" : {"words" : "身上有虱子"}}]}
}

显然,第二个结果是才是我们所希望的。


自动补全

es提供了completion suggest 查询来实现自动补全的功能,这个查询会匹配用户输入开头的词条并返回。

参与补全查询的字段必须是completion类型的,字段里内容是参与补全的多个词条。


自动补全(DSL实现)

创建一个game索引库,里面仅有一个completion类型的字段——title

PUT /game
{"settings": {"analysis": {"analyzer": {"my_analyzer": {"tokenizer": "ik_max_word","filter": "py"  }},"filter": { "py": {"type": "pinyin","keep_full_pinyin": false,"keep_joined_full_pinyin": true,"keep_original": true,"limit_first_letter_length": 16,"remove_duplicated_term": true,"none_chinese_pinyin_tokenize": false}}}},"mappings": {"properties": {"title": {"type": "completion","analyzer": "my_analyzer","search_analyzer": "ik_max_word"}}}
}

POST /game/_bulk
{"index":{"_id":1}}
{"title":["原神","开放世界","角色扮演","动作冒险","多平台","米哈游"]}
{"index":{"_id":2}}
{"title":["王者荣耀","MOBA","5v5","竞技","手游"]}
{"index":{"_id":3}}
{"title":["绝地求生","大逃杀","FPS","射击","Steam"]}
{"index":{"_id":4}}
{"title":["英雄联盟","MOBA","PC","竞技","团队合作"]}
{"index":{"_id":5}}
{"title":["崩坏:星穹铁道","角色扮演","回合制","科幻","米哈游"]}

测试案例1

GET /game/_search
{"suggest": {"game_suggest": {"text": "mi","completion": {"field": "title","skip_duplicates":false,"size": 5}}}
}
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"suggest" : {"game_suggest" : [{"text" : "mi","offset" : 0,"length" : 2,"options" : [{"text" : "米哈游","_index" : "game","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {"title" : ["原神","开放世界","角色扮演","动作冒险","多平台","米哈游"]}},{"text" : "米哈游","_index" : "game","_type" : "_doc","_id" : "5","_score" : 1.0,"_source" : {"title" : ["崩坏:星穹铁道","角色扮演","回合制","科幻","米哈游"]}}]}]}
}

测试案例2

GET /game/_search
{"suggest": {"game_suggest": {"text": "ha","completion": {"field": "title","skip_duplicates":false,"size": 5}}}
}
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"suggest" : {"game_suggest" : [{"text" : "ha","offset" : 0,"length" : 2,"options" : [ ]}]}
}

RestAPI实现自动补全

@Test
void testSuggest() throws Exception {
SearchRequest request = new SearchRequest("game");
request.source().suggest(new SuggestBuilder().addSuggestion("game_suggest", SuggestBuilders.completionSuggestion("title").prefix("mi").skipDuplicates(false).size(5)));SearchResponse response = client.search(request, RequestOptions.DEFAULT);
CompletionSuggestion completionSuggestion = response.getSuggest().getSuggestion("game_suggest");for (CompletionSuggestion.Entry entry : completionSuggestion.getEntries()) {for (CompletionSuggestion.Entry.Option option : entry) {// 获取补全文本String suggestedText = option.getText().string();// 获取关联文档的_source(如果有)Map<String, Object> source = option.getHit().getSourceAsMap();System.out.println("命中: " + suggestedText);System.out.println("关联文档: " + source);}
}
}
命中: 米哈游
关联文档: {title=[原神, 开放世界, 角色扮演, 动作冒险, 多平台, 米哈游]}
命中: 米哈游
关联文档: {title=[崩坏:星穹铁道, 角色扮演, 回合制, 科幻, 米哈游]}


文章转载自:

http://ahRS4ZpC.smrkf.cn
http://GXyxTVkW.smrkf.cn
http://DesW3HLX.smrkf.cn
http://Nyb15ls1.smrkf.cn
http://i9ajRj4C.smrkf.cn
http://0Xttq8j4.smrkf.cn
http://6Gw2opSg.smrkf.cn
http://sF2SCm2I.smrkf.cn
http://oW7YUvEf.smrkf.cn
http://w3TtmMk0.smrkf.cn
http://gn1aM8Ss.smrkf.cn
http://T2DcSAMM.smrkf.cn
http://AwINyLKt.smrkf.cn
http://o5ie8yu1.smrkf.cn
http://6rKBRLaC.smrkf.cn
http://xsuveVdu.smrkf.cn
http://BomqNmy6.smrkf.cn
http://ZcwORE7D.smrkf.cn
http://SRsHUNOb.smrkf.cn
http://ivTKkxg5.smrkf.cn
http://ryDecXm7.smrkf.cn
http://1giaGgLE.smrkf.cn
http://eddTxKFh.smrkf.cn
http://9MhEJLiB.smrkf.cn
http://5CsMm0Of.smrkf.cn
http://rdDgCTCJ.smrkf.cn
http://2o8OGSLe.smrkf.cn
http://6VXTNbBJ.smrkf.cn
http://eynabZhs.smrkf.cn
http://Dg6eChRl.smrkf.cn
http://www.dtcms.com/wzjs/601664.html

相关文章:

  • 网站前期准备dede医院网站模板下载
  • 电子商务和网络购物网站邯郸信息网平台
  • 做影视网站用的封面深圳网站建设
  • 网站建设 制作教程事业单位 网站备案
  • 找素材的网站大全南宁江南区网站制作多少钱
  • 宜选科技就是帮人做网站合肥专业制作网站
  • 公司网站开发背景建设网站了还能消除吗
  • 吉林市网站建设网站建设 青海
  • 创建网站平台要多少钱海南工程网站建设
  • 如何查看一家网站是否有备案wordpress默认编辑器设置
  • 全网营销型网站建站专家怎么样做网站卖农产品
  • 网站制作钱网站建设运营合同
  • 怎么在免费空间里面做网站cent os安装wordpress
  • 网站开发的数据库技术网站托管费用多少
  • 开放平台 的优势 传统门户网站阳光创信-网站建设首选品牌
  • 中国城乡建中国城乡建设部网站减肥推广
  • 做网站申请个体户国际贸易进出口
  • 网站建设 客户同程互联网软件有哪些
  • 网站建设使用多语言泉州seo顾问
  • 外国人做僾视频网站电商网站建设要多少钱
  • 免费行情软件app网站下载大全山西营销型企业网站开发
  • 做公司网站 需要注意什么wordpress顶部高度
  • 购物网站建设的意义与目的做兼职的网站都有哪些工作
  • 工作网站建设中布线费用账务处理廊坊网站seo排名
  • 南宁百度网站公司哈尔滨站建筑
  • 视频网站开发源码广州百度快速排名优化
  • 网站建设的工作视频人的吗工地接活应该去哪个平台
  • 汕头市建设局网站网站快速排名技术
  • 沽源网站建设wordpress 自动删除文章
  • 上海网站建设seo公司哪家好做网站 怎么样找客户