All Projects → INL → Blacklab

INL / Blacklab

A corpus retrieval engine based on Apache Lucene

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Blacklab

Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Stars: ✭ 378 (+447.83%)
Mutual labels:  corpus
Quanteda
An R package for the Quantitative Analysis of Textual Data
Stars: ✭ 647 (+837.68%)
Mutual labels:  corpus
Lyrics Corpora
An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
Stars: ✭ 13 (-81.16%)
Mutual labels:  corpus
Chinese Nlp Corpus
Collections of Chinese NLP corpus
Stars: ✭ 438 (+534.78%)
Mutual labels:  corpus
Weixin public corpus
微信公众号语料库
Stars: ✭ 465 (+573.91%)
Mutual labels:  corpus
Seq2seq Chatbot
Chatbot in 200 lines of code using TensorLayer
Stars: ✭ 777 (+1026.09%)
Mutual labels:  corpus
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (+302.9%)
Mutual labels:  corpus
Mitie chinese wikipedia corpus
Pre-trained Wikipedia corpus by MITIE
Stars: ✭ 43 (-37.68%)
Mutual labels:  corpus
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (+614.49%)
Mutual labels:  corpus
Company Names Corpus
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Stars: ✭ 868 (+1157.97%)
Mutual labels:  corpus
Bookcorpus
Crawl BookCorpus
Stars: ✭ 443 (+542.03%)
Mutual labels:  corpus
Small Chinese Corpus
Some useful Chinese corpus datasets 中文语料小数据
Stars: ✭ 462 (+569.57%)
Mutual labels:  corpus
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+1089.86%)
Mutual labels:  corpus
Corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+6121.74%)
Mutual labels:  corpus
Typing Assistant
Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.
Stars: ✭ 32 (-53.62%)
Mutual labels:  corpus
Fuzzdata
Fuzzing resources for feeding various fuzzers with input. 🔧
Stars: ✭ 376 (+444.93%)
Mutual labels:  corpus
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+9546.38%)
Mutual labels:  corpus
Coarij
Corpus of Annual Reports in Japan
Stars: ✭ 55 (-20.29%)
Mutual labels:  corpus
Chatterbot Corpus
A multilingual dialog corpus
Stars: ✭ 964 (+1297.1%)
Mutual labels:  corpus
Naive Bayes Classifier
Naive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.
Stars: ✭ 6 (-91.3%)
Mutual labels:  corpus

== What is BlackLab? ==

[http://inl.github.io/BlackLab/ BlackLab] is a corpus retrieval engine built on top of [http://lucene.apache.org/ Apache Lucene]. It allows fast, complex searches with accurate hit highlighting on large, tagged and annotated, bodies of text. It was developed at the Institute of Dutch Lexicology (INL) to provide a fast and feature-rich search interface on our historical and contemporary text corpora.

We're also working on BlackLab Server, a web service interface to BlackLab, so you can access it from any programming language. BlackLab Server is included in the repository as well.

BlackLab and BlackLab Server are licensed under the [http://www.apache.org/licenses/LICENSE-2.0 Apache License 2.0].

More information at the [http://inl.github.io/BlackLab/ official project site].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].