All Projects → proiel-treebank → Similar Projects or Alternatives

177 Open source projects that are alternatives of or similar to proiel-treebank

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (+273.33%)

Mutual labels: corpus, linguistics

gum

Repository for the Georgetown University Multilayer Corpus (GUM)

Stars: ✭ 71 (+136.67%)

Mutual labels: corpus, treebank

Weixin public corpus

微信公众号语料库

Stars: ✭ 465 (+1450%)

Mutual labels: corpus, linguistics

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

Stars: ✭ 56 (+86.67%)

Mutual labels: corpus, linguistics

Nlp bahasa resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

Stars: ✭ 158 (+426.67%)

Mutual labels: corpus

Pansori

Tools for ASR Corpus Generation from Online Video

Stars: ✭ 106 (+253.33%)

Mutual labels: corpus

Pyclue

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

Stars: ✭ 91 (+203.33%)

Mutual labels: corpus

Blacklab

A corpus retrieval engine based on Apache Lucene

Stars: ✭ 69 (+130%)

Mutual labels: corpus

WonderfulPolishLanguage

This is a repository created for the list of resources for learning and exploring Wonderful Polish language.

Stars: ✭ 31 (+3.33%)

Mutual labels: linguistics

Prosody

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Stars: ✭ 139 (+363.33%)

Mutual labels: corpus

Typing Assistant

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.

Stars: ✭ 32 (+6.67%)

Mutual labels: corpus

Datasets

Poetry-related datasets developed by THUAIPoet (Jiuge) group.

Stars: ✭ 111 (+270%)

Mutual labels: corpus

Nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (+540%)

Mutual labels: corpus

Lexicon Thai

คลังศัพท์ภาษาไทย

Stars: ✭ 96 (+220%)

Mutual labels: corpus

poesy

Poetic processing, for Python.

Stars: ✭ 28 (-6.67%)

Mutual labels: linguistics

Ja.text8

Japanese text8 corpus for word embedding.

Stars: ✭ 79 (+163.33%)

Mutual labels: corpus

Wp2txt

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

Stars: ✭ 145 (+383.33%)

Mutual labels: corpus

Mitie chinese wikipedia corpus

Pre-trained Wikipedia corpus by MITIE

Stars: ✭ 43 (+43.33%)

Mutual labels: corpus

pylangacq

Language Acquisition Research Tools

Stars: ✭ 33 (+10%)

Mutual labels: linguistics

Code Docstring Corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Stars: ✭ 137 (+356.67%)

Mutual labels: corpus

Company Names Corpus

公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。

Stars: ✭ 868 (+2793.33%)

Mutual labels: corpus

Insuranceqa Corpus Zh

🚁 保险行业语料库，聊天机器人

Stars: ✭ 821 (+2636.67%)

Mutual labels: corpus

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+22086.67%)

Mutual labels: corpus

Dialogue-Corpus

No description or website provided.

Stars: ✭ 27 (-10%)

Mutual labels: corpus

Khcoder

KH Coder: for Quantitative Content Analysis or Text Mining

Stars: ✭ 126 (+320%)

Mutual labels: corpus

Cluepretrainedmodels

高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型

Stars: ✭ 493 (+1543.33%)

Mutual labels: corpus

Small Chinese Corpus

Some useful Chinese corpus datasets 中文语料小数据

Stars: ✭ 462 (+1440%)

Mutual labels: corpus

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Stars: ✭ 2,295 (+7550%)

Mutual labels: corpus

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (+260%)

Mutual labels: corpus

transliteration-php

🇺🇦 🇬🇧 🔡 🐘 PHP library for transliteration.

Stars: ✭ 34 (+13.33%)

Mutual labels: latin

Pubmed Rct

PubMed 200k RCT dataset: a large dataset for sequential sentence classification.

Stars: ✭ 101 (+236.67%)

Mutual labels: corpus

Efaqa Corpus Zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Stars: ✭ 170 (+466.67%)

Mutual labels: corpus

Chi Corpus

迟先生语料库

Stars: ✭ 96 (+220%)

Mutual labels: corpus

Probabilistic-RNN-DA-Classifier

Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model

Stars: ✭ 22 (-26.67%)

Mutual labels: corpus

Dataset List

lists of text corpus and more (mainly Japanese)

Stars: ✭ 84 (+180%)

Mutual labels: corpus

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (+376.67%)

Mutual labels: corpus

Russian news corpus

Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ

Stars: ✭ 76 (+153.33%)

Mutual labels: corpus

pfootprint

Political Discourse Analysis Using Pre-Trained Word Vectors.

Stars: ✭ 20 (-33.33%)

Mutual labels: linguistics

Coarij

Corpus of Annual Reports in Japan

Stars: ✭ 55 (+83.33%)

Mutual labels: corpus

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+7983.33%)

Mutual labels: corpus

Chatterbot Corpus

A multilingual dialog corpus

Stars: ✭ 964 (+3113.33%)

Mutual labels: corpus

DANeS

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

Stars: ✭ 64 (+113.33%)

Mutual labels: corpus

Lyrics Corpora

An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts

Stars: ✭ 13 (-56.67%)

Mutual labels: corpus

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (+356.67%)

Mutual labels: corpus

Naive Bayes Classifier

Naive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.

Stars: ✭ 6 (-80%)

Mutual labels: corpus

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-30%)

Mutual labels: corpus

Seq2seq Chatbot

Chatbot in 200 lines of code using TensorLayer

Stars: ✭ 777 (+2490%)

Mutual labels: corpus

Awesome Chatbot

Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:

Stars: ✭ 1,785 (+5850%)

Mutual labels: corpus

Quanteda

An R package for the Quantitative Analysis of Textual Data

Stars: ✭ 647 (+2056.67%)

Mutual labels: corpus

rclc

Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.

Stars: ✭ 20 (-33.33%)

Mutual labels: corpus

Dialog corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Stars: ✭ 1,662 (+5440%)

Mutual labels: corpus

Bookcorpus

Crawl BookCorpus

Stars: ✭ 443 (+1376.67%)

Mutual labels: corpus

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+6940%)

Mutual labels: corpus

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+1433.33%)

Mutual labels: corpus

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+10076.67%)

Mutual labels: corpus

Chinese Nlp Corpus

Collections of Chinese NLP corpus

Stars: ✭ 438 (+1360%)

Mutual labels: corpus

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+303.33%)

Mutual labels: corpus

nyt-first-said

Tweets when words are published for the first time in the NYT

Stars: ✭ 222 (+640%)

Mutual labels: linguistics

feminizator.github.io

Феминизатор слов

Stars: ✭ 29 (-3.33%)

Mutual labels: linguistics

german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

Stars: ✭ 101 (+236.67%)

Mutual labels: corpus

1-60 of 177 similar projects

›