当前位置: 首页 > wzjs >正文

创意设计专业最好的优化公司

创意设计专业,最好的优化公司,2023年东莞疫情最新消息,请将已备案网站接入访问安装IK分词器;IK分词器配置扩展词库:配置扩展字典-扩展词,配置扩展停止词字典-停用词 安装IK分词器IK分词配置扩展词库配置扩展字典-扩展词配置停止词字典-停用词测试配置字典前配置字典后 本文 ElasticSearch 版本为:7.17.9&…

安装IK分词器;IK分词器配置扩展词库:配置扩展字典-扩展词,配置扩展停止词字典-停用词

  • 安装IK分词器
  • IK分词配置扩展词库
    • 配置扩展字典-扩展词
    • 配置停止词字典-停用词
    • 测试
      • 配置字典前
      • 配置字典后

本文 ElasticSearch 版本为:7.17.9,为了对应 spring-boot-starter-parent2.7.9 版本

安装IK分词器

官网资源:IK Analyzer GitHub 页面
在这里插入图片描述
IK分词器 下载地址:https://release.infinilabs.com/analysis-ik/stable/,下载与 ES 对应的 IK分词器 版本
在这里插入图片描述

然后解压下载的 zip 文件,不要直接解压到本文件夹下,里面直接就是所有文件,选择解压到 XXX 文件夹即可。
在这里插入图片描述
解压好的文件放在 ElasticSearch 目录下的 plugins 文件夹下重启 ES 即可使用(Windows和Linux同理)
在这里插入图片描述

IK分词配置扩展词库

IK分词器 不是 ElasticSearch 自带的分词器,需要用户自己全装。一般是安装在 ElasticSearchplugins 文件夹中的,要扩展 IK分词器 的词库,只需要修改 IK分词器 目录中的 config 目录中的 IKAnalyzer.cfg.xml 文件:

默认配置文件如下:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties><comment>IK Analyzer 扩展配置</comment><!--用户可以在这里配置自己的扩展字典 --><entry key="ext_dict"></entry><!--用户可以在这里配置自己的扩展停止词字典--><entry key="ext_stopwords"></entry><!--用户可以在这里配置远程扩展字典 --><!-- <entry key="remote_ext_dict">words_location</entry> --><!--用户可以在这里配置远程扩展停止词字典--><!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

需要我们指定自己的字典文件名去进行扩展,如:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties><comment>IK Analyzer 扩展配置</comment><!--用户可以在这里配置自己的扩展字典 --><entry key="ext_dict">dict.dic</entry><!--用户可以在这里配置自己的扩展停止词字典--><entry key="ext_stopwords">stopwords.dic</entry><!--用户可以在这里配置远程扩展字典 --><!-- <entry key="remote_ext_dict">words_location</entry> --><!--用户可以在这里配置远程扩展停止词字典--><!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改配置之后,这两个配置的文件也要去对应的新建,与 IKAnalyzer.cfg.xml 文件平级即可
在这里插入图片描述
配置好之后,重启 ElasticSearch 即可生效。

配置扩展字典-扩展词

新增 dict.dic 文件,配置如下:

搬砖
画大饼
已读不回
舔狗
摆烂

配置停止词字典-停用词

在已有的 stopword.dic 文件配置停用词,配置之后,这些词不会被解析出来:
在这里插入图片描述

测试

配置字典前

使用 ElasticSearch 的可视化界面 Kibana 的调试工具 Dev Tools 调用解析接口测试:

# `IK Analyzer` 扩展字典。
POST /_analyze
{"analyzer": "ik_smart","text": "小明是一个Java程序员,白天摸鱼学习ElasticSearch,晚上加班到九点,搬砖完之后,去找他的女神画大饼,女神却已读不回,小明认清自己是个舔狗,直接摆烂,吸食海洛因"
}

解析结果:

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"tokens" : [{"token" : "小明","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "是","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 1},{"token" : "一个","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "java","start_offset" : 5,"end_offset" : 9,"type" : "ENGLISH","position" : 3},{"token" : "程序员","start_offset" : 9,"end_offset" : 12,"type" : "CN_WORD","position" : 4},{"token" : "白天","start_offset" : 13,"end_offset" : 15,"type" : "CN_WORD","position" : 5},{"token" : "摸鱼","start_offset" : 15,"end_offset" : 17,"type" : "CN_WORD","position" : 6},{"token" : "学习","start_offset" : 17,"end_offset" : 19,"type" : "CN_WORD","position" : 7},{"token" : "elasticsearch","start_offset" : 19,"end_offset" : 32,"type" : "ENGLISH","position" : 8},{"token" : "晚上","start_offset" : 33,"end_offset" : 35,"type" : "CN_WORD","position" : 9},{"token" : "加班","start_offset" : 35,"end_offset" : 37,"type" : "CN_WORD","position" : 10},{"token" : "到","start_offset" : 37,"end_offset" : 38,"type" : "CN_CHAR","position" : 11},{"token" : "九点","start_offset" : 38,"end_offset" : 40,"type" : "TYPE_CQUAN","position" : 12},{"token" : "搬","start_offset" : 41,"end_offset" : 42,"type" : "CN_CHAR","position" : 13},{"token" : "砖","start_offset" : 42,"end_offset" : 43,"type" : "CN_CHAR","position" : 14},{"token" : "完","start_offset" : 43,"end_offset" : 44,"type" : "CN_CHAR","position" : 15},{"token" : "之后","start_offset" : 44,"end_offset" : 46,"type" : "CN_WORD","position" : 16},{"token" : "去","start_offset" : 47,"end_offset" : 48,"type" : "CN_CHAR","position" : 17},{"token" : "找他","start_offset" : 48,"end_offset" : 50,"type" : "CN_WORD","position" : 18},{"token" : "的","start_offset" : 50,"end_offset" : 51,"type" : "CN_CHAR","position" : 19},{"token" : "女神","start_offset" : 51,"end_offset" : 53,"type" : "CN_WORD","position" : 20},{"token" : "画","start_offset" : 53,"end_offset" : 54,"type" : "CN_CHAR","position" : 21},{"token" : "大饼","start_offset" : 54,"end_offset" : 56,"type" : "CN_WORD","position" : 22},{"token" : "女神","start_offset" : 57,"end_offset" : 59,"type" : "CN_WORD","position" : 23},{"token" : "却已","start_offset" : 59,"end_offset" : 61,"type" : "CN_WORD","position" : 24},{"token" : "读","start_offset" : 61,"end_offset" : 62,"type" : "CN_CHAR","position" : 25},{"token" : "不回","start_offset" : 62,"end_offset" : 64,"type" : "CN_WORD","position" : 26},{"token" : "小明","start_offset" : 65,"end_offset" : 67,"type" : "CN_WORD","position" : 27},{"token" : "认清","start_offset" : 67,"end_offset" : 69,"type" : "CN_WORD","position" : 28},{"token" : "自己","start_offset" : 69,"end_offset" : 71,"type" : "CN_WORD","position" : 29},{"token" : "是","start_offset" : 71,"end_offset" : 72,"type" : "CN_CHAR","position" : 30},{"token" : "个","start_offset" : 72,"end_offset" : 73,"type" : "CN_CHAR","position" : 31},{"token" : "舔","start_offset" : 73,"end_offset" : 74,"type" : "CN_CHAR","position" : 32},{"token" : "狗","start_offset" : 74,"end_offset" : 75,"type" : "CN_CHAR","position" : 33},{"token" : "直接","start_offset" : 76,"end_offset" : 78,"type" : "CN_WORD","position" : 34},{"token" : "摆","start_offset" : 78,"end_offset" : 79,"type" : "CN_CHAR","position" : 35},{"token" : "烂","start_offset" : 79,"end_offset" : 80,"type" : "CN_CHAR","position" : 36},{"token" : "吸食","start_offset" : 81,"end_offset" : 83,"type" : "CN_WORD","position" : 37},{"token" : "海洛因","start_offset" : 83,"end_offset" : 86,"type" : "CN_WORD","position" : 38}]
}

配置字典后

使用 ElasticSearch 的可视化界面 Kibana 的调试工具 Dev Tools 调用解析接口测试:

# `IK Analyzer` 扩展字典。
POST /_analyze
{"analyzer": "ik_smart","text": "小明是一个Java程序员,白天摸鱼学习ElasticSearch,晚上加班到九点,搬砖完之后,去找他的女神画大饼,女神却已读不回,小明认清自己是个舔狗,直接摆烂,吸食海洛因"
}

解析结果:

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"tokens" : [{"token" : "小明","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "是","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 1},{"token" : "一个","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "java","start_offset" : 5,"end_offset" : 9,"type" : "ENGLISH","position" : 3},{"token" : "程序员","start_offset" : 9,"end_offset" : 12,"type" : "CN_WORD","position" : 4},{"token" : "白天","start_offset" : 13,"end_offset" : 15,"type" : "CN_WORD","position" : 5},{"token" : "摸鱼","start_offset" : 15,"end_offset" : 17,"type" : "CN_WORD","position" : 6},{"token" : "学习","start_offset" : 17,"end_offset" : 19,"type" : "CN_WORD","position" : 7},{"token" : "elasticsearch","start_offset" : 19,"end_offset" : 32,"type" : "ENGLISH","position" : 8},{"token" : "晚上","start_offset" : 33,"end_offset" : 35,"type" : "CN_WORD","position" : 9},{"token" : "加班","start_offset" : 35,"end_offset" : 37,"type" : "CN_WORD","position" : 10},{"token" : "到","start_offset" : 37,"end_offset" : 38,"type" : "CN_CHAR","position" : 11},{"token" : "九点","start_offset" : 38,"end_offset" : 40,"type" : "TYPE_CQUAN","position" : 12},{"token" : "搬砖","start_offset" : 41,"end_offset" : 43,"type" : "CN_WORD","position" : 13},{"token" : "完","start_offset" : 43,"end_offset" : 44,"type" : "CN_CHAR","position" : 14},{"token" : "之后","start_offset" : 44,"end_offset" : 46,"type" : "CN_WORD","position" : 15},{"token" : "去","start_offset" : 47,"end_offset" : 48,"type" : "CN_CHAR","position" : 16},{"token" : "找他","start_offset" : 48,"end_offset" : 50,"type" : "CN_WORD","position" : 17},{"token" : "女神","start_offset" : 51,"end_offset" : 53,"type" : "CN_WORD","position" : 18},{"token" : "画大饼","start_offset" : 53,"end_offset" : 56,"type" : "CN_WORD","position" : 19},{"token" : "女神","start_offset" : 57,"end_offset" : 59,"type" : "CN_WORD","position" : 20},{"token" : "却","start_offset" : 59,"end_offset" : 60,"type" : "CN_CHAR","position" : 21},{"token" : "已读不回","start_offset" : 60,"end_offset" : 64,"type" : "CN_WORD","position" : 22},{"token" : "小明","start_offset" : 65,"end_offset" : 67,"type" : "CN_WORD","position" : 23},{"token" : "认清","start_offset" : 67,"end_offset" : 69,"type" : "CN_WORD","position" : 24},{"token" : "自己","start_offset" : 69,"end_offset" : 71,"type" : "CN_WORD","position" : 25},{"token" : "是","start_offset" : 71,"end_offset" : 72,"type" : "CN_CHAR","position" : 26},{"token" : "个","start_offset" : 72,"end_offset" : 73,"type" : "CN_CHAR","position" : 27},{"token" : "舔狗","start_offset" : 73,"end_offset" : 75,"type" : "CN_WORD","position" : 28},{"token" : "直接","start_offset" : 76,"end_offset" : 78,"type" : "CN_WORD","position" : 29},{"token" : "摆烂","start_offset" : 78,"end_offset" : 80,"type" : "CN_WORD","position" : 30},{"token" : "吸食","start_offset" : 81,"end_offset" : 83,"type" : "CN_WORD","position" : 31}]
}

可以看到网络热词 “搬砖”、“画大饼”、“已读不回”、“舔狗”、“摆烂” 均可以正确识别,敏感词 “海洛因” 没有解析。

http://www.dtcms.com/wzjs/256904.html

相关文章:

  • 丰涵网站建设科技成都官网seo厂家
  • 屏蔽网站ip如何屏蔽百度广告推广
  • 网站上线具体流程有创意的营销案例
  • 福建网站设计制作资阳地seo
  • 手表网站背景素材贴吧推广
  • 做渠道该从哪些网站入手深圳全网营销平台排名
  • wordpress 2015主题公园东莞网站优化
  • 北京做网站设计招聘模板免费网站建设
  • 做啥网站好亚洲卫星电视网参数表
  • 网站引擎友好性分析百度员工收入工资表
  • 买了空间和域名 就有网站后台了吗广州seo公司
  • 河南平顶山网站建设公司郴州seo快速排名
  • 百度收录网站怎么更改关键词爱站网挖掘词
  • pc网站原型设计工具上海网站推广系统
  • 晋城客运东站网站开发武汉大学人民医院洪山院区
  • 河北沙河市建设局网站搜狗登录入口
  • 网站编程软件有哪些怎么把网站排名到百度前三名
  • 深圳做营销网站建设关键词seo公司推荐
  • 微信授权登录网站退出怎么做google search
  • 重庆点优建设网站公司吗快速优化官网
  • 网站建设专有名词网络seo首页
  • wordpress实现在线客服官网seo是什么意思
  • 网站子域名怎么做黄冈网站推广软件免费下载
  • 成都科技网站建设哪里有seo是什么工作
  • 网站如何引导页世界500强企业名单
  • 网站建设与制作考试题html网页制作代码
  • wordpress游戏评测站睡觉河南网站建设公司哪家好
  • 二级建造师证怎么考青岛网站优化
  • 教育主管部门建设的专题资源网站汕头网站推广排名
  • 广告logo图片大全宁波优化网站厂家