Elastcisearch6.0需要装ikik中文分词器器吗

3725人阅读
Elasticsearch(52)
ElasticSearch默认分词器的配置。已知ES默认的分词器是标准分词器Standard。如果需要修改默认分词器可以做如下设置:
&在配置文件config/elasticsearch.yml中添加 &index.anlysis.analyzer.default.type:ik。
当然在ik5.0.0中还是有些许的变化
IK5.0.0:移除名为ik的analyzer和tokenizer,修改为 ik_smart 和 ik_max_word
Analyzer: ik_smart , ik_max_word , Tokenizer: ik_smart , ik_max_word
所以在配置是ik是无效的,需要配置为 ik_smart ,或者是 ik_max_word。
其中 ik_smart 为最少切分,ik_max_word为最细粒度划分。
elasticsearch-analysis-ik-5.0.0的安装步骤
1.下载elasticsearch-analysis-ik-5.0.0的安装包
下载地址:
从中选择适合的IK版本,需要和安装的ES版本一致。
版本对照表:
IK-ES版本对照表
IK version
ES version
5.x-master
本文前提是已经安装ElasticSearch5.0.0版本
所以需要下载对应的 IK版本为5.0.0
下载时间有点长,下载完成之后得到
elasticsearch-analysis-ik-5.0.0.zip的压缩包,解压缩。
查看pom.xml中ES的版本是否与实际相符,如果版本跨度比较小的话可以直接修改,否则就需要重新下载对应版本的。
2.编译 IK5.0.0
如果下载的 是编译好的,则直接拷贝
否则的话需要安装maven工具,进行编译,如果未安装请查看
这里就不做详细介绍
首先打开命令窗体,然后进入解压的ik目录下面
cd:&C:\elasticsearch-analysis-ik-5.0.0
然后执行命令:&mvn package 进行编译
编译成功的结果如下:
打包之后,会在elasticsearch-analysis-ik-5.0.0中多了一个文件夹tagert
其目录结构如下:
├─config
└─target
& & ├─archive-tmp
& & ├─classes
& & ├─generated-sources
& & ├─maven-archiver
& & ├─maven-status
& & ├─releases
& & │ &└─elasticsearch-analysis-ik-5.0.0.zip
& & └─surefire
复制target/releases/elasticsearch-analysis-ik-5.0.0.zip to your-es-root/plugins/ik
编译好的直接复制到your-es-root/plugins/ik目录下即可
3.重启Elasticsearch
4.检测IK分词器能否正常使用
打开kibana 使用其中的工具执行如下命令
GET _analyze
&analyzer&:&ik_smart&,
&text&:&长春市市长&
得到的结果是
&tokens&: [
&token&: &长春市&,
&start_offset&: 0,
&end_offset&: 3,
&type&: &CN_WORD&,
&position&: 0
&token&: &市长&,
&start_offset&: 3,
&end_offset&: 5,
&type&: &CN_WORD&,
&position&: 1
GET _analyze
&analyzer&:&ik_max_word&,
&text&:&长春市市长&
得到的结果是
&tokens&: [
&token&: &长春市&,
&start_offset&: 0,
&end_offset&: 3,
&type&: &CN_WORD&,
&position&: 0
&token&: &长春&,
&start_offset&: 0,
&end_offset&: 2,
&type&: &CN_WORD&,
&position&: 1
&token&: &市&,
&start_offset&: 2,
&end_offset&: 3,
&type&: &CN_CHAR&,
&position&: 2
&token&: &市长&,
&start_offset&: 3,
&end_offset&: 5,
&type&: &CN_WORD&,
&position&: 3
}以上就看到IK分词器已经可以正常使用。
接下来,继续研究如何配置远程加载词库,以及如何实现词库的热更新。
参考:/xing901022/p/5910139.html
&&相关文章推荐
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
访问:406795次
积分:5558
积分:5558
排名:第5400名
原创:161篇
转载:74篇
译文:16篇
评论:123条
文章:52篇
阅读:109323
(1)(1)(1)(2)(6)(2)(5)(7)(3)(3)(7)(5)(8)(21)(7)(8)(7)(3)(9)(9)(4)(4)(8)(12)(21)(10)(7)(25)(11)(12)(22)
(window.slotbydup = window.slotbydup || []).push({
id: '4740887',
container: s,
size: '250,250',
display: 'inlay-fix'Elasticsearch6.0 IKAnalysis分词使用 - 旭日升 - 博客园
随笔 - 53, 文章 - 0, 评论 - 22, 引用 - 0
Elasticsearch 内置的分词器对中文不友好,会把中文分成单个字来进行全文检索,不能达到想要的结果,在全文检索及新词发展如此快的互联网时代,IK可以进行友好的分词及自定义分词。
IK Analyzer是一个开源的,基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版,目前支持最新版本的ES6.X版本。
ik 带有两个分词器
ik_max_word :会将文本做最细粒度的拆分;尽可能多的拆分出词语
ik_smart:会做最粗粒度的拆分;已被分出的词语将不会再次被其它词语占有
1. 安装插件
如果是集群模式,则每个节点都需要安装ik分词,安装插件完毕后需要重启服务,创建mapping前如果有机器未安装分词,则可能该索引可能为RED,需要删除后重建。
./bin/elasticsearch-plugin install /medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip
2. 创建索引
可以使用CURL命令,6.0版本+也可以在Kibana插件x-pack的DevTools中进行调试API
curl -XPUT http://localhost:9200/index
3. 创建mappiing
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
&properties&: {
&content&: {
&type&: &text&,
&analyzer&: &ik_max_word&,
&search_analyzer&: &ik_max_word&
4. 创建文档
curl -XPOST http://localhost:9200/index/fulltext/1 -d'
{&content&:&美国留给伊拉克的是个烂摊子吗&}
curl -XPOST http://localhost:9200/index/fulltext/2 -d'
{&content&:&公安部:各地校车将享最高路权&}
curl -XPOST http://localhost:9200/index/fulltext/3 -d'
{&content&:&中韩渔警冲突调查:韩警平均每天扣1艘中国渔船&}
curl -XPOST http://localhost:9200/index/fulltext/4 -d'
{&content&:&中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首&}
5. 查询文档
curl -XPOST http://localhost:9200/index/fulltext/_search
&query& : { &match& : { &content& : &中国& }},
&highlight& : {
&pre_tags& : [&&tag1&&, &&tag2&&],
&post_tags& : [&&/tag1&&, &&/tag2&&],
&fields& : {
&content& : {}
&took&: 14,
&timed_out&: false,
&_shards&: {
&total&: 5,
&successful&: 5,
&failed&: 0
&total&: 2,
&max_score&: 2,
&_index&: &index&,
&_type&: &fulltext&,
&_id&: &4&,
&_score&: 2,
&_source&: {
&content&: &中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首&
&highlight&: {
&content&: [
&&tag1&中国&/tag1&驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 &
&_index&: &index&,
&_type&: &fulltext&,
&_id&: &3&,
&_score&: 2,
&_source&: {
&content&: &中韩渔警冲突调查:韩警平均每天扣1艘中国渔船&
&highlight&: {
&content&: [
&均每天扣1艘&tag1&中国&/tag1&渔船 &
IK支持自定义配置词库,配置文件在config文件夹下的analysis-ik/IKAnalyzer.cfg.xml,字典文件也在同级目录下,可以支持多个选项的配置,ext_dict-自定义词库,ext_stopwords-屏蔽词库。
同时还支持热更新配置,配置remote_ext_dict为http地址,输入一行一个词语,注意文档格式要为UTF8无BOM格式,如果词库发生更新,只需要更新response header中任意一个字段Last-Modified或ETag即可。
&?xml version=&1.0& encoding=&UTF-8&?&
&!DOCTYPE properties SYSTEM &/dtd/properties.dtd&&
&properties&
&comment&IK Analyzer 扩展配置&/comment&
&!--用户可以在这里配置自己的扩展字典 --&
&entry key=&ext_dict&&custom/mydict.custom/single_word_low_freq.dic&/entry&
&!--用户可以在这里配置自己的扩展停止词字典--&
&entry key=&ext_stopwords&&custom/ext_stopword.dic&/entry&
&!--用户可以在这里配置远程扩展字典 --&
&entry key=&remote_ext_dict&&location&/entry&
&!--用户可以在这里配置远程扩展停止词字典--&
&entry key=&remote_ext_stopwords&&/xxx.dic&/entry&
&/properties&
github地址:相关文章推荐
Elasticsearch安装中文分词插件ik
目录http://blog.csdn.net/fenglailea/article/details/老版本http://blog.csdn.net/fenglailea/article...
一、拼音分词的应用拼音分词在日常生活中其实很常见,也许你每天都在用。打开淘宝看一看吧,输入拼音”zhonghua”,下面会有包含”zhonghua”对应的中文”中华”的商品的提示:拼音分词是根据输入的...
Elasticsearch 内置的分词器对中文不友好,会把中文分成单个字来进行全文检索,不能达到想要的结果
看一个例子curl -XGET 'http://localhost:9200/_analy...
elasticsearch中配置中文分词器以及自定义分词器
添加中文分词
可以直接使用配置好的es中文版:/medcl/elasticsearch-rtf
可以可以自己集成中文分词组件,medcl为es写了三个中文分词插...
1.下载ik分词器
从这个网址中直接下载: /medcl/elasticsearch-analysis-ik/releases
或者选择与es匹配的ik版本...
最开始安装,选择了Database Server ,安装完成后想使用图形化,需要额外再安装。
最初安装类型如下,选择时要注意:
Desktop  :基本的桌面系统,包括常用的桌面软件,如文档查看工...
////////dcoker (2)网络配置///////
[root@foundation95 ~]# docker info
Containers: 1
Running: 0
他的最新文章
他的热门文章
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字)(Anoop M K)
21:35:51 UTC
I am new to ELK, recently configured AWS-Linux box with elastcisearch and kibana.
Process for both kibana and elastcisearch started successfully and able to get the logs and corresponding results.
But after sometime elastciseach process gets killed automatically, ie i have to start the process
again to get active. Could you please suggest any help on this?
elastciseach logs:-
dashboard]$ ./elasticsearch-2.3.3/bin/elasticsearch[ 21:31:26,738][INFO ][node
] [Centurion] version[2.3.3], pid[23133], build[218bdf1/T15:40:04Z][ 21:31:26,739][INFO ][node
] [Centurion] initializing ...[ 21:31:28,133][INFO ][plugins
] [Centurion] modules [lang-groovy, reindex, lang-expression], plugins [], sites [][ 21:31:28,179][INFO ][env
] [Centurion] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usablespace [5.9gb], net total_space [7.7gb], spins? [no], types [ext4]_[ 21:31:28,179][INFO ][env
] [Centurion] heap size [1015.6mb], compressed ordinary object pointers [true][ 21:31:28,180][WARN ][env
] [Centurion] max file descriptors [4096] for elasticsearch process likely too low, consider increasing to at least [65536][ 21:31:32,469][INFO ][node
] [Centurion] initialized[ 21:31:32,469][INFO ][node
] [Centurion] starting ...[ 21:31:32,596][INFO ][transport
] [Centurion] publishaddress {xxx.xxx.xxx.xxx:9300}, bound_addresses {xxx.xxx.xxx.xxx:9300}_[ 21:31:32,605][INFO ][discovery
] [Centurion] elasticsearch/f-BQjtvWQCaunaOLu8OJkA[ 21:31:35,805][INFO ][cluster.service
] [Centurion] newmaster {Centurion}{f-BQjtvWQCaunaOLu8OJkA}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)_[ 21:31:35,837][INFO ][http
] [Centurion] publishaddress {xxx.xxx.xxx.xxx:9200}, bound_addresses {xxx.xxx.xxx.xxx:9200}_[ 21:31:35,838][INFO ][node
] [Centurion] started[ 21:31:35,962][INFO ][gateway
] [Centurion] recovered [2] indices into clusterstate_[ 21:31:37,238][INFO ][cluster.routing.allocation] [Centurion] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-][4]] ...]).
(David Pilato)
01:55:42 UTC
Please format your logs with &/& icon.
Do you start Elasticsearch with bin/elasticsearch?So anytime you log out, the process is stopped because it runs in foreground.
You should run Elasticsearch as a service instead.Use Deb or rpm packages instead.
(Anoop M K)
03:38:20 UTC
Thank you David for the response!
I am using command --
"bin/elasticsearch & " for running in background.
Also could you please elaborate "format your logs with &/& icon"
I am using logs format as below
Log file:-
[LAYER]GSDIM, [TYPE]REQ, [METHOD]SESSION, [ACTION]START, [latitude]47.60621,[longitude]-122.33207[LAYER]GSDIM, [TYPE]REP, [TIME]1, [METHOD]SESSION, [ACTION]START, [RESPONSE]Session Started,[latitude]47.60621,[longitude]-122.33207
(David Pilato)
04:03:04 UTC
I meant that instead of posting here:
dashboard]$ ./elasticsearch-2.3.3/bin/elasticsearch[ 21:31:26,738][INFO ][node ] [Centurion] version[2.3.3], pid[23133], build[218bdf1/T15:40:04Z][ 21:31:26,739][INFO ][node ] [Centurion] initializing ...
dashboard]$ ./elasticsearch-2.3.3/bin/elasticsearch
[ 21:31:26,738][INFO ][node ] [Centurion] version[2.3.3], pid[23133], build[218bdf1/T15:40:04Z]
[ 21:31:26,739][INFO ][node ] [Centurion] initializing ...
13:36:08 UTC
anoopmk007:
I am using command --
"bin/elasticsearch & " for running in background.
Even though this runs in the background it will still get killed when you logout. I definitely agree with the recommendation to use existing RPM or DEBs and to run this as a service.
However, if you really don't want to run it as a service, you can detach the background process from your session if you want it to continue when you logout:
# bin/elasticsearch && /dev/null &
# disown %1
(Anoop M K)
21:22:51 UTC
Just now changed the process to run as a service using RPM, but still have the issue of getting automatically stopped/killed.
Couple of other observations:-
If I am starting only elastcisearch service then elastcisearch service will not gets killed/stopped.
sudo service elasticsearch start
If i am starting elasctisearch and kibana in same server then elastcisearch
service will gets killed/stopped after some time.
sudo service elasticsearch start
sudo service kibana start
logs i can check here for this issue? Any help appreciated!
17:46:31 UTC
dmesg and /var/log/messages might show any system errors like out of memory conditions.
22:33:06 UTC

我要回帖

更多关于 es ik分词 的文章

 

随机推荐