All Projects → NLPIR-team → nlpir-analysis-cn-ictclas

NLPIR-team / nlpir-analysis-cn-ictclas

Licence: Apache-2.0 license
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to nlpir-analysis-cn-ictclas

NLPIR-ICTCLAS
The Java Package of NLPIR-ICTCLAS.
Stars: ✭ 16 (-77.46%)
Mutual labels:  chinese-word-segmentation, nlpir, ictclas
Solrplugins
Dice Solr Plugins from Simon Hughes Dice.com
Stars: ✭ 86 (+21.13%)
Mutual labels:  solr, lucene
Vectorsinsearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (+0%)
Mutual labels:  solr, lucene
Ik Analyzer
支持Lucene5/6/7/8+版本, 长期维护。
Stars: ✭ 112 (+57.75%)
Mutual labels:  solr, lucene
Lucene Solr
Apache Lucene and Solr open-source search software
Stars: ✭ 4,217 (+5839.44%)
Mutual labels:  solr, lucene
Ik Analyzer Solr
ik-analyzer for solr 7.x-8.x
Stars: ✭ 1,017 (+1332.39%)
Mutual labels:  solr, lucene
Springboot Templates
springboot和dubbo、netty的集成,redis mongodb的nosql模板, kafka rocketmq rabbit的MQ模板, solr solrcloud elasticsearch查询引擎
Stars: ✭ 100 (+40.85%)
Mutual labels:  solr, lucene
Jeeplatform
一款企业信息化开发基础平台,拟集成OA(办公自动化)、CMS(内容管理系统)等企业系统的通用业务功能 JeePlatform项目是一款以SpringBoot为核心框架,集ORM框架Mybatis,Web层框架SpringMVC和多种开源组件框架而成的一款通用基础平台,代码已经捐赠给开源中国社区
Stars: ✭ 1,285 (+1709.86%)
Mutual labels:  solr, lucene
Fxdesktopsearch
A JavaFX based desktop search application.
Stars: ✭ 147 (+107.04%)
Mutual labels:  solr, lucene
Code4java
Repository for my java projects.
Stars: ✭ 164 (+130.99%)
Mutual labels:  solr, lucene
jstarcraft-nlp
专注于解决自然语言处理领域的几个核心问题:词法分析,句法分析,语义分析,语种检测,信息抽取,文本聚类和文本分类. 为相关领域的研发人员提供完整的通用设计与参考实现. 涵盖了多种自然语言处理算法,适配了多个自然语言处理框架. 兼容Lucene/Solr/ElasticSearch插件.
Stars: ✭ 92 (+29.58%)
Mutual labels:  solr, lucene
Hanlp Lucene Plugin
HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
Stars: ✭ 272 (+283.1%)
Mutual labels:  solr, lucene
SolrConfigExamples
Examples of Solr configuration entries for Solr plugins and Conceptual Search\Semantic Search from Simon Hughes Dice.com
Stars: ✭ 26 (-63.38%)
Mutual labels:  solr, lucene
jease
Jease is a Java CMS framework based on Object Database
Stars: ✭ 25 (-64.79%)
Mutual labels:  solr, lucene
RelevancyTuning
Dice.com tutorial on using black box optimization algorithms to do relevancy tuning on your Solr Search Engine Configuration from Simon Hughes Dice.com
Stars: ✭ 28 (-60.56%)
Mutual labels:  solr, lucene
Querqy
Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)
Stars: ✭ 122 (+71.83%)
Mutual labels:  solr, lucene
solr
Apache Solr open-source search software
Stars: ✭ 651 (+816.9%)
Mutual labels:  solr, lucene
solr-container
Ansible Container project that manages the lifecycle of Apache Solr on Docker.
Stars: ✭ 17 (-76.06%)
Mutual labels:  solr, lucene
solr-zkutil
Solr Cloud and ZooKeeper CLI
Stars: ✭ 14 (-80.28%)
Mutual labels:  solr
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-76.06%)
Mutual labels:  chinese-word-segmentation

Now NLPIR/ICTCLAS for Lucene/Solr plugin V2.2

Lucene-analyzers-nlpir-ictclas-6.6.0

NLPIR/ICTCLAS for Lucene/Solr 6.6.0 analyzer plugin. Support: MacOS,Linux x86/64, Windows x86/64

The project resources folder is a source folder, which contains all platform's dynamic libraries and push them to the classpath.//Source Folder 保证所有平台下的动态库自动部署到classpath环境下,以便JNA加载动态库。

Building Lucene-analyzers-nlpir-ictclas

Lucene-analyzers-nlpir-ictclas is built by Maven. To build Lucene-analyzers-nlpir-ictclas run:

mvn clean package -DskipTests

Or if you use IDE(Eclipse), there is also the same way.

How to use in your projects

You can use NLPIRTokenizerAnalyzer to do the Chinese Word Segmentation:

  • NLPIRTokenizerAnalyzer DEMO
        String text="我是中国人";
        NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);
        TokenStream  ts  = nta.tokenStream("word", text);  
        ts.reset();
        CharTermAttribute  term = ts.getAttribute(CharTermAttribute.class);
        while(ts.incrementToken()){
            System.out.println(term.toString());
        }
        ts.end();
        ts.close();
        nta.close();

and also use in Lucene:

  • Lucene DEMO

The sample shows how to index your text and search by using NLPIRTokenizerAnalyzer.

        //For indexing
        NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);
        IndexWriterConfig inconf=new IndexWriterConfig(nta);
        inconf.setOpenMode(OpenMode.CREATE_OR_APPEND);
        IndexWriter index=new IndexWriter(FSDirectory.open(Paths.get("index/")),inconf);
        Document doc = new Document();
        doc.add(new TextField("contents", "特朗普表示,很高兴汉堡会晤后再次同习近平主席通话。我同习主席就重大问题保持沟通和协调、两国加强各层级和各领域交往十分重要。当前,美中关系发展态势良好,我相信可以发展得更好。我期待着对中国进行国事访问。",Field.Store.YES));
        index.addDocument(doc);
        index.flush();
        index.close();
        //for searching
        String field = "contents";
        IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get("index/")));
        IndexSearcher searcher = new IndexSearcher(reader);
        QueryParser parser = new QueryParser(field, nta);
        Query query = parser.parse("特朗普习近平");
        TopDocs top=searcher.search(query, 100);
        ScoreDoc[] hits = top.scoreDocs;
        for(int i=0;i<hits.length;i++) {
          System.out.println("doc="+hits[i].doc+" score="+hits[i].score);
          Document d = searcher.doc(hits[i].doc);
          System.out.println(d.get("contents"));
        }

How Solr Install

To make part of Solr, you need these files:

  1. the plugin jar, which you have built and put it in your core's lib directory.
  2. nlpir.properties contains:
data="" #Data directory‘s parent path
encoding=1 #0 GBK;1 UTF-8
sLicenseCode="" # License code
userDict="" # user dictionary, a text file
bOverwrite=false # whether overwrite the existed user dictionary or not
  1. data directory, you can find it in NLPIR SDK https://github.com/NLPIR-team/NLPIR/tree/master/NLPIR%20SDK/NLPIR-ICTCLAS

Waring: You need to make sure the plugin jar can find the nlpir.properties file. You can put the file to solr_home/server/, and the data need to set the path of NLPIR/ICTCLAS Data.

  • Solr Managed-schema
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizerFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizerFactory"/>
    </analyzer>
  </fieldType>
  1. dependency jar for dll: jna.jar. add to your solr's lib.

Tokenizer

  • v2.*
//Standard Tokenizer
class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"
//Finer Segment
class="org.nlpir.lucene.cn.ictclas.finersegmet.FinerTokenizer"
  • v1.*
//Standard Tokenizer
class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"

Solr Show

Alt text

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].