All Projects → federkasten → clucie

federkasten / clucie

Licence: other
Clojure for the Lucene

Programming Languages

clojure
4091 projects

Projects that are alternatives of or similar to clucie

CodeIndex
A Code Index Searching Tools Based On Lucene.Net
Stars: ✭ 28 (-49.09%)
Mutual labels:  lucene, fulltext-search
beagle
Beagle helps you identify keywords, phrases, regexes, and complex search queries of interest in streams of text documents.
Stars: ✭ 46 (-16.36%)
Mutual labels:  lucene
IndexWikipedia
A simple utility to index wikipedia dumps using Lucene.
Stars: ✭ 20 (-63.64%)
Mutual labels:  lucene
soda-for-java
SODA (Simple Oracle Document Access) for Java is an Oracle library for writing Java apps that work with JSON (and not only JSON!) in the Oracle Database. SODA allows your Java app to use the Oracle Database as a NoSQL document store.
Stars: ✭ 61 (+10.91%)
Mutual labels:  fulltext-search
lucene-postings-format
At-a-glance overview diagrams of Apache Lucene's default PostingsFormat (inverted index binary format).
Stars: ✭ 65 (+18.18%)
Mutual labels:  lucene
mastodo
A fork of the GNU Social/AP-compatible microblogging server
Stars: ✭ 29 (-47.27%)
Mutual labels:  fulltext-search
luceneappengine
This project provides a directory useful to build Lucene and Google App Engine powered applications
Stars: ✭ 16 (-70.91%)
Mutual labels:  lucene
alix
A Lucene Indexer for XML, with lexical analysis (lemmatization for French)
Stars: ✭ 15 (-72.73%)
Mutual labels:  lucene
myblog
项目:一款Github上开源的博客系统项目 目的:对学到的框架、开源组件、前端技术进行应用学习。同时开发完成后写技术博客,开源到Github上
Stars: ✭ 23 (-58.18%)
Mutual labels:  lucene
solr-container
Ansible Container project that manages the lifecycle of Apache Solr on Docker.
Stars: ✭ 17 (-69.09%)
Mutual labels:  lucene
querqy-elasticsearch
Querqy for Elasticsearch
Stars: ✭ 37 (-32.73%)
Mutual labels:  lucene
lucene-arabic-analyzer
Apache Lucene analyzer for Arabic language with root based stemmer.
Stars: ✭ 27 (-50.91%)
Mutual labels:  lucene
luke
Please use the luke bundled with lucene! This repo is archived and frozen now.
Stars: ✭ 101 (+83.64%)
Mutual labels:  lucene
jease
Jease is a Java CMS framework based on Object Database
Stars: ✭ 25 (-54.55%)
Mutual labels:  lucene
jstarcraft-nlp
专注于解决自然语言处理领域的几个核心问题:词法分析,句法分析,语义分析,语种检测,信息抽取,文本聚类和文本分类. 为相关领域的研发人员提供完整的通用设计与参考实现. 涵盖了多种自然语言处理算法,适配了多个自然语言处理框架. 兼容Lucene/Solr/ElasticSearch插件.
Stars: ✭ 92 (+67.27%)
Mutual labels:  lucene
Fissoft.EntityFramework.Fts
Full Text Search for Microsoft SQL Server with Entity Framework
Stars: ✭ 55 (+0%)
Mutual labels:  fulltext-search
liqe
Lightweight and performant Lucene-like parser, serializer and search engine.
Stars: ✭ 513 (+832.73%)
Mutual labels:  lucene
nlpir-analysis-cn-ictclas
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。
Stars: ✭ 71 (+29.09%)
Mutual labels:  lucene
HongsCORE
Hong's Common Object Requesting Engine
Stars: ✭ 49 (-10.91%)
Mutual labels:  lucene
epitweetr
ECDC Early warning tool using Twitter data
Stars: ✭ 50 (-9.09%)
Mutual labels:  lucene

Clucie

Clojure for the Lucene

Build Status

Clojars Project

codecov

Usage

Simple Usage

(require '[clucie.core :as core])
(require '[clucie.analysis :as analysis])
(require '[clucie.store :as store])

(def analyzer (analysis/standard-analyzer))
(def index-store (store/memory-store)) ; or (store/disk-store "path/to/store")

(core/add! index-store
           [{:number "1" :title "Please Please Me"}
            {:number "2" :title "With the Beatles"}
            {:number "3" :title "A Hard Day's Night"}
            {:number "4" :title "Beatles for Sale"}
            {:number "5" :title "Help!"}]
           [:number :title]
           analyzer)

(core/search index-store
             {:title "Beatles"}
             10 ; max-num
             analyzer
             0 ; page
             5) ; max-num-per-page

;; => [{:number "2", :title "With the Beatles"} {:number "4", :title "Beatles for Sale"}]

;; Phrase search
(core/phrase-search index-store
                    {:title "beatles for"}
                    10
                    analyzer
                    0
                    5)

;; => [{:number "4", :title "Beatles for Sale"}]

(core/phrase-search index-store
                    {:title "for beatles"}
                    10
                    analyzer
                    0
                    5)

;; => []

;; AND search
(core/search index-store
             {:title ["Beatles" "Sale"]}
             10
             analyzer
             0
             5)

;; => [{:number "4", :title "Beatles for Sale"}]

;; AND search, across multiple keys
(core/search index-store
             [{:number "4"} {:title ["Beatles" "Sale"]}]
             10
             analyzer
             0
             5)

;; => [{:number "4", :title "Beatles for Sale"}]

(core/search index-store
             [{:number "3"} {:title "Beatles"}]
             10
             analyzer
             0
             5)

;; => []

;; OR search
(core/search index-store
             {:title #{"Beatles" "Please"}}
             10
             analyzer
             0
             5)

;; => [{:number "1", :title "Please Please Me"} {:number "2", :title "With the Beatles"} {:number "4", :title "Beatles for Sale"}]

;; Get meta information
(let [results (core/search index-store
                           {:title #{"Beatles" "Please"}}
                           10
                           analyzer
                           0
                           5)]
  ;; the total number of hits
  (prn (:total-hits (meta results))) ; => 3
  ;; scores
  (prn (map #(:score (meta %)) results))) ; => (0.62241787 0.3930676 0.3930676)

(store/close! index-store)

To update index,

(core/update! index-store
              {:number "5" :title "Help! (1965)"}
              [:number :title]
              :number "5"
              analyzer)

To delete index,

(core/delete! index-store :number "5" analyzer)

CJK (Chinese, Japanese, and Korean) Support

(def cjk-analyzer (analysis/cjk-analyzer))

(def my-analyzer (analysis/analyzer-mapping (analysis/keyword-analyzer)
                                            {:content cjk-analyzer}))

(core/add! index-store
           [{:key "English" :content "Thank you"}
            {:key "Chinese" :content "谢谢"}
            {:key "Japanese" :content "ありがとう"}
            {:key "Korean" :content "고마워요"}]
           [:key :content]
           my-analyzer)

Japanese Support (Kuromoji)

(def kuromoji-analyzer (analysis/kuromoji-analyzer))

(def my-analyzer (analysis/analyzer-mapping (analysis/keyword-analyzer)
                                            {:content kuromoji-analyzer}))

To tokenize,

(let [text "富士は日本一の山"
      user-dict nil
      discard-punctuation? true
      mode :normal ; :normal :extended :search
      factory nil]
  (analysis/kuromoji-tokenize text user-dict discard-punctuation? mode factory)) ; => ("富士" "は" "日本一" "の" "山")

Custom analyzer

To build custom analyzer, you can use build-analyzer macro. The following example builds an analyzer that normalizes input texts, splits texts into words, and generates n-grams.

(analysis/build-analyzer
  (JapaneseTokenizer. nil true JapaneseTokenizer$Mode/NORMAL)
  :char-filter-factories [(ICUNormalizer2CharFilterFactory. (HashMap. {"name" "nfkc", "mode" "compose"}))]
  :token-filters [(LowerCaseFilter.)
                  (max-shingle/MaxShingleFilter. 3 " ")])

Reusing connections

By default, update/search functions create a new writer/reader each time, however, that is somewhat inefficient and not thread-safe. For high performance or concurrent processing, you can pass directly a writer/reader to them.

(with-open [writer (store/store-writer index-store analyzer)]
  (core/add! writer
             [{:number "1" :title "Please Please Me"}
              {:number "2" :title "With the Beatles"}]
             [:number :title]))

(with-open [reader (store/store-reader index-store)]
  (core/search reader
               {:title "Beatles"}
               10
               analyzer))

Run tests

Run lein midje.

Get coverage

Run lein cloverage and see target/coverage/index.html.

License

Copyright Takashi AOKI and other contributors.

Licensed under the Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].