All Projects → medcl → Elasticsearch Carrot2

medcl / Elasticsearch Carrot2

a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,

Programming Languages

java
68154 projects - #9 most used programming language

elasticsearch.carrot2

a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,

License Apache2

Version

master | 0.90.0 -> master 1.2.0 | 0.90.0 1.1.1 | 0.20.2

the demo page is here: http://s.medcl.net/?query=Search+API++Search+Type

a detailed tutorial is here: http://log.medcl.net/item/2013/06/tutorial-clustering-search-result-with-plugin-tools-carrot2/

1.download lexical files (https://github.com/downloads/medcl/elasticsearch-carrot2/config.zip) ,put them into the config folder. 2.bin/plugin install medcl/elasticsearch-carrot2/1.1.1

2.you download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf) https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/tools.carrot2

have fun.

curl -XPOST http://localhost:9200/elasticsearch_resources/_carrot2?carrot2.language=ENGLISH&carrot2.title_fields=title&carrot2.summary_fields=snippet&carrot2.url_field=url&carrot2.attach_detail=true&carrot2.cluster_count_base=10&carrot2.cluster_phrase_label_boost=2.0
-d'
{
    "query": {
        "bool": {
            "should": [
                {
                    "match_all": {}
                }
            ]
        }
    },
    "from": 0,
    "size": 500
}
'

Response sample: https://gist.github.com/2184894

carrot2.language=ENGLISH                [check appendix to view supported language]
carrot2.title_fields                    [which filed in doc's source will be used as title for clustering]
carrot2.summary_fields                  [which filed in doc's source will be used as summary for clustering]
carrot2.url_field                       [which filed in doc's source will be used as url for clustering]
carrot2.attach_hits=false               [set false to decrease the size of response,will remove the original search hits]
carrot2.attach_detail                   [set false to just return the id,title/summary/url will not included in response]
carrot2.max_cluster_size=100            [the max num of clusters will be returned]
carrot2.max_doc_per_cluster=10          [the max num of the docs within a cluster will be returned]
carrot2.cluster_count_base=30           [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.desiredClusterCountBase]
carrot2.cluster_phrase_label_boost=1.5  [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.phraseLabelBoost]

supported algorithm: LingoClusteringAlgorithm

TODO: STCClusteringAlgorithm BisectingKMeansClusteringAlgorithm ByFieldClusteringAlgorithm ByUrlClusteringAlgorithm

language: ARABIC, BULGARIAN, CZECH, CHINESE_SIMPLIFIED, DANISH, DUTCH, ENGLISH, ESTONIAN, FINNISH, FRENCH, GERMAN, GREEK, HUNGARIAN, ITALIAN, IRISH, KOREAN, LATVIAN, LITHUANIAN, MALTESE, NORWEGIAN, POLISH, PORTUGUESE, ROMANIAN, RUSSIAN, SLOVAK, SLOVENE, SPANISH, SWEDISH, THAI, TURKISH;

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].