NLPIR-team / nlpir-analysis-cn-ictclas

Licence: Apache-2.0 license

Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。

Programming Languages

java

68154 projects - #9 most used programming language

Projects that are alternatives of or similar to nlpir-analysis-cn-ictclas

NLPIR-ICTCLAS

The Java Package of NLPIR-ICTCLAS.

Stars: ✭ 16 (-77.46%)

Mutual labels: chinese-word-segmentation, nlpir, ictclas

Solrplugins

Dice Solr Plugins from Simon Hughes Dice.com

Stars: ✭ 86 (+21.13%)

Mutual labels: solr, lucene

Vectorsinsearch

Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015

Stars: ✭ 71 (+0%)

Mutual labels: solr, lucene

Ik Analyzer

支持Lucene5/6/7/8+版本, 长期维护。

Stars: ✭ 112 (+57.75%)

Mutual labels: solr, lucene

Lucene Solr

Apache Lucene and Solr open-source search software

Stars: ✭ 4,217 (+5839.44%)

Mutual labels: solr, lucene

Ik Analyzer Solr

ik-analyzer for solr 7.x-8.x

Stars: ✭ 1,017 (+1332.39%)

Mutual labels: solr, lucene

Springboot Templates

springboot和dubbo、netty的集成，redis mongodb的nosql模板， kafka rocketmq rabbit的MQ模板， solr solrcloud elasticsearch查询引擎

Stars: ✭ 100 (+40.85%)

Mutual labels: solr, lucene

Jeeplatform

一款企业信息化开发基础平台，拟集成OA(办公自动化)、CMS(内容管理系统)等企业系统的通用业务功能 JeePlatform项目是一款以SpringBoot为核心框架，集ORM框架Mybatis，Web层框架SpringMVC和多种开源组件框架而成的一款通用基础平台，代码已经捐赠给开源中国社区

Stars: ✭ 1,285 (+1709.86%)

Mutual labels: solr, lucene

Fxdesktopsearch

A JavaFX based desktop search application.

Stars: ✭ 147 (+107.04%)

Mutual labels: solr, lucene

Code4java

Repository for my java projects.

Stars: ✭ 164 (+130.99%)

Mutual labels: solr, lucene

jstarcraft-nlp

专注于解决自然语言处理领域的几个核心问题:词法分析,句法分析,语义分析,语种检测,信息抽取,文本聚类和文本分类. 为相关领域的研发人员提供完整的通用设计与参考实现. 涵盖了多种自然语言处理算法,适配了多个自然语言处理框架. 兼容Lucene/Solr/ElasticSearch插件.

Stars: ✭ 92 (+29.58%)

Mutual labels: solr, lucene

Hanlp Lucene Plugin

HanLP中文分词Lucene插件，支持包括Solr在内的基于Lucene的系统

Stars: ✭ 272 (+283.1%)

Mutual labels: solr, lucene

SolrConfigExamples

Examples of Solr configuration entries for Solr plugins and Conceptual Search\Semantic Search from Simon Hughes Dice.com

Stars: ✭ 26 (-63.38%)

Mutual labels: solr, lucene

jease

Jease is a Java CMS framework based on Object Database

Stars: ✭ 25 (-64.79%)

Mutual labels: solr, lucene

RelevancyTuning

Dice.com tutorial on using black box optimization algorithms to do relevancy tuning on your Solr Search Engine Configuration from Simon Hughes Dice.com

Stars: ✭ 28 (-60.56%)

Mutual labels: solr, lucene

Querqy

Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)

Stars: ✭ 122 (+71.83%)

Mutual labels: solr, lucene

solr

Apache Solr open-source search software

Stars: ✭ 651 (+816.9%)

Mutual labels: solr, lucene

solr-container

Ansible Container project that manages the lifecycle of Apache Solr on Docker.

Stars: ✭ 17 (-76.06%)

Mutual labels: solr, lucene

solr-zkutil

Solr Cloud and ZooKeeper CLI

Stars: ✭ 14 (-80.28%)

Mutual labels: solr

berserker

Berserker - BERt chineSE woRd toKenizER

Stars: ✭ 17 (-76.06%)

Mutual labels: chinese-word-segmentation

View All Similar Projects ➔

Now NLPIR/ICTCLAS for Lucene/Solr plugin V2.2

Lucene-analyzers-nlpir-ictclas-6.6.0

NLPIR/ICTCLAS for Lucene/Solr 6.6.0 analyzer plugin. Support: MacOS,Linux x86/64, Windows x86/64

The project resources folder is a source folder, which contains all platform's dynamic libraries and push them to the classpath.//Source Folder 保证所有平台下的动态库自动部署到classpath环境下，以便JNA加载动态库。

Building Lucene-analyzers-nlpir-ictclas

Lucene-analyzers-nlpir-ictclas is built by Maven. To build Lucene-analyzers-nlpir-ictclas run:

mvn clean package -DskipTests

Or if you use IDE(Eclipse), there is also the same way.

How to use in your projects

You can use NLPIRTokenizerAnalyzer to do the Chinese Word Segmentation:

NLPIRTokenizerAnalyzer DEMO

        String text="我是中国人";
        NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);
        TokenStream  ts  = nta.tokenStream("word", text);  
        ts.reset();
        CharTermAttribute  term = ts.getAttribute(CharTermAttribute.class);
        while(ts.incrementToken()){
            System.out.println(term.toString());
        }
        ts.end();
        ts.close();
        nta.close();

and also use in Lucene：

Lucene DEMO

The sample shows how to index your text and search by using NLPIRTokenizerAnalyzer.

        //For indexing
        NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);
        IndexWriterConfig inconf=new IndexWriterConfig(nta);
        inconf.setOpenMode(OpenMode.CREATE_OR_APPEND);
        IndexWriter index=new IndexWriter(FSDirectory.open(Paths.get("index/")),inconf);
        Document doc = new Document();
        doc.add(new TextField("contents", "特朗普表示，很高兴汉堡会晤后再次同习近平主席通话。我同习主席就重大问题保持沟通和协调、两国加强各层级和各领域交往十分重要。当前，美中关系发展态势良好，我相信可以发展得更好。我期待着对中国进行国事访问。",Field.Store.YES));
        index.addDocument(doc);
        index.flush();
        index.close();
        //for searching
        String field = "contents";
        IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get("index/")));
        IndexSearcher searcher = new IndexSearcher(reader);
        QueryParser parser = new QueryParser(field, nta);
        Query query = parser.parse("特朗普习近平");
        TopDocs top=searcher.search(query, 100);
        ScoreDoc[] hits = top.scoreDocs;
        for(int i=0;i<hits.length;i++) {
          System.out.println("doc="+hits[i].doc+" score="+hits[i].score);
          Document d = searcher.doc(hits[i].doc);
          System.out.println(d.get("contents"));
        }

How Solr Install

To make part of Solr, you need these files:

the plugin jar, which you have built and put it in your core's lib directory.
nlpir.properties contains:

data="" #Data directory‘s parent path
encoding=1 #0 GBK;1 UTF-8
sLicenseCode="" # License code
userDict="" # user dictionary, a text file
bOverwrite=false # whether overwrite the existed user dictionary or not

data directory, you can find it in NLPIR SDK https://github.com/NLPIR-team/NLPIR/tree/master/NLPIR%20SDK/NLPIR-ICTCLAS

Waring: You need to make sure the plugin jar can find the nlpir.properties file. You can put the file to solr_home/server/, and the data need to set the path of NLPIR/ICTCLAS Data.

Solr Managed-schema

  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizerFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizerFactory"/>
    </analyzer>
  </fieldType>

dependency jar for dll: jna.jar. add to your solr's lib.

Tokenizer

v2.*

//Standard Tokenizer
class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"
//Finer Segment
class="org.nlpir.lucene.cn.ictclas.finersegmet.FinerTokenizer"

v1.*

//Standard Tokenizer
class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"

Solr Show

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

NLPIR-team / nlpir-analysis-cn-ictclas

Programming Languages

Labels

Projects that are alternatives of or similar to nlpir-analysis-cn-ictclas

Now NLPIR/ICTCLAS for Lucene/Solr plugin V2.2

Lucene-analyzers-nlpir-ictclas-6.6.0

Building Lucene-analyzers-nlpir-ictclas

How to use in your projects

How Solr Install

Tokenizer

Solr Show