Elastcisearch6.0需要装ikik中文分词器器吗

solidworks | PHP | c4d | 细胞生物学 | HTML | 冬奥会 | 基因 | 营销策划 | 扫地机器人 | 武侠 | 大学生就业 | 电学 | 国航 | 电子技术研发 | 几何学 | 外星人 | 语言学 | 秦时明月之天行九歌 | 金融数学 | 三国人物 | 休学 | 小店区 | 杨紫 | 植保无人机 | CSS | 陶渊明 | 少数民族 | AutoCAD | 3d打印机 | 香港购物 | 日语语法 | 对联 | matlab | 按键精灵 | 粉丝（Fans） | 语言学习 | 总决赛 | 驾驶经验 | Spss数据分析 | 日本漫画 | 数学建模 | 道德 | 项目管理 | 背景音乐（bgm） | 云主机 | 3D Max | onenote | 游戏原画 | 科学 | 网站建设 | 热血传奇（游戏） | 身高 | 网站运营 | 道教 | 社会学 | 迅雷（软件） | 爬虫（计算机网络） | O2O | 运载火箭 | 遗传学 | 率土之滨 | 百度输入法 | 极限挑战(综艺节目) | 电梯 | 女性主义 | Adobe After Effects | mysql | 办公软件 | 法国 | ps3 | 化学实验 | QQ群 | 中国中央电视台 | 前女友 | 性格 | 免费软件 | 分子生物学 | 金庸小说 | 留学生 | Microsoft SQL Server | 龙珠 | 设计院 | C#编程 | 虚拟机 | 字幕 | 微信群 | 创业项目 | 祛痘 | 图形处理器（gpu） | Microsoft Visual Studio | 动物保护 | C/C++ | facebook | 秦岭 | 燕窝 | 人性 | 下载 | 驾驶技术 | 大学数学 | 封神演义 | 整容 | 西装 | 马克思主义哲学 | 计算机专业 | pdf | thinkpad | 代理 | 参考文献 | 江苏大学 | 游戏手柄 | 城市规划 | 黑洞 | 旅行 | CAD制图 | 风水 | 直播 | 快捷键 | 编辑器 | 机器学习 | 暴走大事件 | 球球大作战 | unity（游戏引擎） | 永恒之塔 | DJI大疆创新 | 传统文化 | wordpress | 仙剑奇侠传（游戏） | 国际物流 | 安徽 | 配音 | 猎头公司 | 在线教育 | 欧洲冠军联赛 | ios游戏 | 洛奇英雄传 | 暗恋 | 网盘 | 星座爱情 | 剧场版 | 面相 | 讯飞输入法 | 记忆力 | 超级战队 | stm32 | 亚马逊中国 | Apple ID | 服装设计 | 网络主播 | 品牌营销 | 情侣 | 新加坡 | 调酒 | 雷欧奥特曼 | 花样姐姐 | 物联网 | 任天堂3ds | 易经 | 户型 | 流氓软件 | 圣经 | 进化 | 垃圾分类 | 函数 | 星际穿越（电影） | 山东工艺美术学院 | 优酷视频 | github | 舰队 Collection | 流行音乐 | 进击的巨人 | playstation vita | 科学研究 | 欢乐麻将 | 史莱姆 | 海关 | Internet Explorer | 刑事案件 | 取名 | 江苏银行 | eDonkey网络 | 表情包 | mfc | 大学军训 | 诸葛亮 | Apple WATCH | 嵌入式系统 | 私募证券投资基金 | iOS应用 | 对外经贸大学 | 最强大脑（电视节目） | 青蛙 | 日本代购 | 巧克力 | 天涯明月刀ol（游戏） | 食用油 | 曹操 | SEO | 生命 | 乌贼 | 我的英雄学院 |

你的位置：网站首页 >> 频道首页 >>硬件 >>Elastcisearch6.0需要装ikik中文分词器器吗

Elastcisearch6.0需要装ikik中文分词器器吗

来源：蜘蛛抓取(WebSpider) 时间：2017-12-22 06:03 标签： es ik分词

3725人阅读
Elasticsearch（52）
ElasticSearch默认分词器的配置。已知ES默认的分词器是标准分词器Standard。如果需要修改默认分词器可以做如下设置：
&在配置文件config/elasticsearch.yml中添加 &index.anlysis.analyzer.default.type:ik。
当然在ik5.0.0中还是有些许的变化
IK5.0.0：移除名为ik的analyzer和tokenizer，修改为 ik_smart 和 ik_max_word
Analyzer: ik_smart , ik_max_word , Tokenizer: ik_smart , ik_max_word
所以在配置是ik是无效的，需要配置为 ik_smart ,或者是 ik_max_word。
其中 ik_smart 为最少切分，ik_max_word为最细粒度划分。
elasticsearch-analysis-ik-5.0.0的安装步骤
1.下载elasticsearch-analysis-ik-5.0.0的安装包
下载地址：
从中选择适合的IK版本，需要和安装的ES版本一致。
版本对照表：
IK-ES版本对照表
IK version
ES version
5.x-master
本文前提是已经安装ElasticSearch5.0.0版本
所以需要下载对应的 IK版本为5.0.0
下载时间有点长，下载完成之后得到
elasticsearch-analysis-ik-5.0.0.zip的压缩包，解压缩。
查看pom.xml中ES的版本是否与实际相符，如果版本跨度比较小的话可以直接修改，否则就需要重新下载对应版本的。
2.编译 IK5.0.0
如果下载的是编译好的，则直接拷贝
否则的话需要安装maven工具，进行编译，如果未安装请查看
这里就不做详细介绍
首先打开命令窗体，然后进入解压的ik目录下面
cd:&C:\elasticsearch-analysis-ik-5.0.0
然后执行命令：&mvn package 进行编译
编译成功的结果如下：
打包之后，会在elasticsearch-analysis-ik-5.0.0中多了一个文件夹tagert
其目录结构如下：
├─config
└─target
& & ├─archive-tmp
& & ├─classes
& & ├─generated-sources
& & ├─maven-archiver
& & ├─maven-status
& & ├─releases
& & │ &└─elasticsearch-analysis-ik-5.0.0.zip
& & └─surefire
复制target/releases/elasticsearch-analysis-ik-5.0.0.zip to your-es-root/plugins/ik
编译好的直接复制到your-es-root/plugins/ik目录下即可
3.重启Elasticsearch
4.检测IK分词器能否正常使用
打开kibana 使用其中的工具执行如下命令
GET _analyze
&analyzer&:&ik_smart&,
&text&:&长春市市长&
得到的结果是
&tokens&: [
&token&: &长春市&,
&start_offset&: 0,
&end_offset&: 3,
&type&: &CN_WORD&,
&position&: 0
&token&: &市长&,
&start_offset&: 3,
&end_offset&: 5,
&type&: &CN_WORD&,
&position&: 1
GET _analyze
&analyzer&:&ik_max_word&,
&text&:&长春市市长&
得到的结果是
&tokens&: [
&token&: &长春市&,
&start_offset&: 0,
&end_offset&: 3,
&type&: &CN_WORD&,
&position&: 0
&token&: &长春&,
&start_offset&: 0,
&end_offset&: 2,
&type&: &CN_WORD&,
&position&: 1
&token&: &市&,
&start_offset&: 2,
&end_offset&: 3,
&type&: &CN_CHAR&,
&position&: 2
&token&: &市长&,
&start_offset&: 3,
&end_offset&: 5,
&type&: &CN_WORD&,
&position&: 3
}以上就看到IK分词器已经可以正常使用。
接下来，继续研究如何配置远程加载词库，以及如何实现词库的热更新。
参考：/xing901022/p/5910139.html
&&相关文章推荐
* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
访问：406795次
积分：5558
积分：5558
排名：第5400名
原创：161篇
转载：74篇
译文：16篇
评论：123条
文章：52篇
阅读：109323
(1)(1)(1)(2)(6)(2)(5)(7)(3)(3)(7)(5)(8)(21)(7)(8)(7)(3)(9)(9)(4)(4)(8)(12)(21)(10)(7)(25)(11)(12)(22)
(window.slotbydup = window.slotbydup || []).push({
id: '4740887',
container: s,
size: '250,250',
display: 'inlay-fix'Elasticsearch6.0 IKAnalysis分词使用 - 旭日升 - 博客园
随笔 - 53, 文章 - 0, 评论 - 22, 引用 - 0
Elasticsearch 内置的分词器对中文不友好，会把中文分成单个字来进行全文检索，不能达到想要的结果，在全文检索及新词发展如此快的互联网时代，IK可以进行友好的分词及自定义分词。
IK Analyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版，目前支持最新版本的ES6.X版本。
ik 带有两个分词器
ik_max_word ：会将文本做最细粒度的拆分；尽可能多的拆分出词语
ik_smart：会做最粗粒度的拆分；已被分出的词语将不会再次被其它词语占有
1. 安装插件
如果是集群模式，则每个节点都需要安装ik分词，安装插件完毕后需要重启服务，创建mapping前如果有机器未安装分词，则可能该索引可能为RED，需要删除后重建。
./bin/elasticsearch-plugin install /medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip
2. 创建索引
可以使用CURL命令，6.0版本+也可以在Kibana插件x-pack的DevTools中进行调试API
curl -XPUT http://localhost:9200/index
3. 创建mappiing
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
&properties&: {
&content&: {
&type&: &text&,
&analyzer&: &ik_max_word&,
&search_analyzer&: &ik_max_word&
4. 创建文档
curl -XPOST http://localhost:9200/index/fulltext/1 -d'
{&content&:&美国留给伊拉克的是个烂摊子吗&}
curl -XPOST http://localhost:9200/index/fulltext/2 -d'
{&content&:&公安部：各地校车将享最高路权&}
curl -XPOST http://localhost:9200/index/fulltext/3 -d'
{&content&:&中韩渔警冲突调查：韩警平均每天扣1艘中国渔船&}
curl -XPOST http://localhost:9200/index/fulltext/4 -d'
{&content&:&中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首&}
5. 查询文档
curl -XPOST http://localhost:9200/index/fulltext/_search
&query& : { &match& : { &content& : &中国& }},
&highlight& : {
&pre_tags& : [&&tag1&&, &&tag2&&],
&post_tags& : [&&/tag1&&, &&/tag2&&],
&fields& : {
&content& : {}
&took&: 14,
&timed_out&: false,
&_shards&: {
&total&: 5,
&successful&: 5,
&failed&: 0
&total&: 2,
&max_score&: 2,
&_index&: &index&,
&_type&: &fulltext&,
&_id&: &4&,
&_score&: 2,
&_source&: {
&content&: &中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首&
&highlight&: {
&content&: [
&&tag1&中国&/tag1&驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首 &
&_index&: &index&,
&_type&: &fulltext&,
&_id&: &3&,
&_score&: 2,
&_source&: {
&content&: &中韩渔警冲突调查：韩警平均每天扣1艘中国渔船&
&highlight&: {
&content&: [
&均每天扣1艘&tag1&中国&/tag1&渔船 &
IK支持自定义配置词库，配置文件在config文件夹下的analysis-ik/IKAnalyzer.cfg.xml，字典文件也在同级目录下，可以支持多个选项的配置，ext_dict-自定义词库，ext_stopwords-屏蔽词库。
同时还支持热更新配置，配置remote_ext_dict为http地址，输入一行一个词语，注意文档格式要为UTF8无BOM格式，如果词库发生更新，只需要更新response header中任意一个字段Last-Modified或ETag即可。
&?xml version=&1.0& encoding=&UTF-8&?&
&!DOCTYPE properties SYSTEM &/dtd/properties.dtd&&
&properties&
&comment&IK Analyzer 扩展配置&/comment&
&!--用户可以在这里配置自己的扩展字典 --&
&entry key=&ext_dict&&custom/mydict.custom/single_word_low_freq.dic&/entry&
&!--用户可以在这里配置自己的扩展停止词字典--&
&entry key=&ext_stopwords&&custom/ext_stopword.dic&/entry&
&!--用户可以在这里配置远程扩展字典 --&
&entry key=&remote_ext_dict&&location&/entry&
&!--用户可以在这里配置远程扩展停止词字典--&
&entry key=&remote_ext_stopwords&&/xxx.dic&/entry&
&/properties&
github地址：相关文章推荐
Elasticsearch安装中文分词插件ik
目录http://blog.csdn.net/fenglailea/article/details/老版本http://blog.csdn.net/fenglailea/article...
一、拼音分词的应用拼音分词在日常生活中其实很常见，也许你每天都在用。打开淘宝看一看吧,输入拼音”zhonghua”,下面会有包含”zhonghua”对应的中文”中华”的商品的提示：拼音分词是根据输入的...
Elasticsearch 内置的分词器对中文不友好，会把中文分成单个字来进行全文检索，不能达到想要的结果
看一个例子curl -XGET 'http://localhost:9200/_analy...
elasticsearch中配置中文分词器以及自定义分词器
添加中文分词
可以直接使用配置好的es中文版：/medcl/elasticsearch-rtf
可以可以自己集成中文分词组件，medcl为es写了三个中文分词插...
1.下载ik分词器
从这个网址中直接下载: /medcl/elasticsearch-analysis-ik/releases
或者选择与es匹配的ik版本...
最开始安装，选择了Database Server ，安装完成后想使用图形化，需要额外再安装。
最初安装类型如下，选择时要注意：
Desktop 　：基本的桌面系统，包括常用的桌面软件，如文档查看工...
////////dcoker （2）网络配置///////
[root@foundation95 ~]# docker info
Containers: 1
Running: 0
他的最新文章
他的热门文章
您举报文章：
举报原因：
原文地址：
原因补充：
(最多只允许输入30个字)(Anoop M K)
21:35:51 UTC
I am new to ELK, recently configured AWS-Linux box with elastcisearch and kibana.
Process for both kibana and elastcisearch started successfully and able to get the logs and corresponding results.
But after sometime elastciseach process gets killed automatically, ie i have to start the process
again to get active. Could you please suggest any help on this?
elastciseach logs:-
dashboard]$ ./elasticsearch-2.3.3/bin/elasticsearch[ 21:31:26,738][INFO ][node
] [Centurion] version[2.3.3], pid[23133], build[218bdf1/T15:40:04Z][ 21:31:26,739][INFO ][node
] [Centurion] initializing ...[ 21:31:28,133][INFO ][plugins
] [Centurion] modules [lang-groovy, reindex, lang-expression], plugins [], sites [][ 21:31:28,179][INFO ][env
] [Centurion] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usablespace [5.9gb], net total_space [7.7gb], spins? [no], types [ext4]_[ 21:31:28,179][INFO ][env
] [Centurion] heap size [1015.6mb], compressed ordinary object pointers [true][ 21:31:28,180][WARN ][env
] [Centurion] max file descriptors [4096] for elasticsearch process likely too low, consider increasing to at least [65536][ 21:31:32,469][INFO ][node
] [Centurion] initialized[ 21:31:32,469][INFO ][node
] [Centurion] starting ...[ 21:31:32,596][INFO ][transport
] [Centurion] publishaddress {xxx.xxx.xxx.xxx:9300}, bound_addresses {xxx.xxx.xxx.xxx:9300}_[ 21:31:32,605][INFO ][discovery
] [Centurion] elasticsearch/f-BQjtvWQCaunaOLu8OJkA[ 21:31:35,805][INFO ][cluster.service
] [Centurion] newmaster {Centurion}{f-BQjtvWQCaunaOLu8OJkA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)_[ 21:31:35,837][INFO ][http
] [Centurion] publishaddress {xxx.xxx.xxx.xxx:9200}, bound_addresses {xxx.xxx.xxx.xxx:9200}_[ 21:31:35,838][INFO ][node
] [Centurion] started[ 21:31:35,962][INFO ][gateway
] [Centurion] recovered [2] indices into clusterstate_[ 21:31:37,238][INFO ][cluster.routing.allocation] [Centurion] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-][4]] ...]).
(David Pilato)
01:55:42 UTC
Please format your logs with &/& icon.
Do you start Elasticsearch with bin/elasticsearch?So anytime you log out, the process is stopped because it runs in foreground.
You should run Elasticsearch as a service instead.Use Deb or rpm packages instead.
(Anoop M K)
03:38:20 UTC
Thank you David for the response!
I am using command --
"bin/elasticsearch & " for running in background.
Also could you please elaborate "format your logs with &/& icon"
I am using logs format as below
Log file:-
[LAYER]GSDIM, [TYPE]REQ, [METHOD]SESSION, [ACTION]START, [latitude]47.60621,[longitude]-122.33207[LAYER]GSDIM, [TYPE]REP, [TIME]1, [METHOD]SESSION, [ACTION]START, [RESPONSE]Session Started,[latitude]47.60621,[longitude]-122.33207
(David Pilato)
04:03:04 UTC
I meant that instead of posting here:
dashboard]$ ./elasticsearch-2.3.3/bin/elasticsearch[ 21:31:26,738][INFO ][node ] [Centurion] version[2.3.3], pid[23133], build[218bdf1/T15:40:04Z][ 21:31:26,739][INFO ][node ] [Centurion] initializing ...
dashboard]$ ./elasticsearch-2.3.3/bin/elasticsearch
[ 21:31:26,738][INFO ][node ] [Centurion] version[2.3.3], pid[23133], build[218bdf1/T15:40:04Z]
[ 21:31:26,739][INFO ][node ] [Centurion] initializing ...
13:36:08 UTC
anoopmk007:
I am using command --
"bin/elasticsearch & " for running in background.
Even though this runs in the background it will still get killed when you logout. I definitely agree with the recommendation to use existing RPM or DEBs and to run this as a service.
However, if you really don't want to run it as a service, you can detach the background process from your session if you want it to continue when you logout:
# bin/elasticsearch && /dev/null &
# disown %1
(Anoop M K)
21:22:51 UTC
Just now changed the process to run as a service using RPM, but still have the issue of getting automatically stopped/killed.
Couple of other observations:-
If I am starting only elastcisearch service then elastcisearch service will not gets killed/stopped.
sudo service elasticsearch start
If i am starting elasctisearch and kibana in same server then elastcisearch
service will gets killed/stopped after some time.
sudo service elasticsearch start
sudo service kibana start
logs i can check here for this issue? Any help appreciated!
17:46:31 UTC
dmesg and /var/log/messages might show any system errors like out of memory conditions.
22:33:06 UTC

Elastcisearch6.0需要装ikik中文分词器器吗

我要回帖

更多关于 es ik分词的文章

随机推荐

Elastcisearch6.0需要装ikik中文分词器器吗

我要回帖

更多关于 es ik分词 的文章

随机推荐

更多关于 es ik分词的文章