Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (-11.11%)

Mutual labels: corpus

Dialog corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Stars: ✭ 1,662 (+1219.05%)

Mutual labels: corpus

Lexicon

A data package containing lexicons and dictionaries for text analysis

Stars: ✭ 87 (-30.95%)

Mutual labels: text-mining

Scattertext

Beautiful visualizations of how language differs among document types.

Stars: ✭ 1,722 (+1266.67%)

Mutual labels: text-mining

Datasets

Poetry-related datasets developed by THUAIPoet (Jiuge) group.

Stars: ✭ 111 (-11.9%)

Mutual labels: corpus

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (-14.29%)

Mutual labels: corpus

Text predictor

Char-level RNN LSTM text generator📄.

Stars: ✭ 99 (-21.43%)

Mutual labels: text-mining

Textcluster

短文本聚类预处理模块 Short text cluster

Stars: ✭ 115 (-8.73%)

Mutual labels: text-mining

Chi Corpus

迟先生语料库

Stars: ✭ 96 (-23.81%)

Mutual labels: corpus

Keywords2vec

Stars: ✭ 121 (-3.97%)

Mutual labels: text-mining

Pyclue

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

Stars: ✭ 91 (-27.78%)

Mutual labels: corpus

Sejong Corpus

Korean sejong corpus download and simple analysis

Stars: ✭ 116 (-7.94%)

Mutual labels: corpus

Pansori

Tools for ASR Corpus Generation from Online Video

Stars: ✭ 106 (-15.87%)

Mutual labels: corpus

Genius

Easily access song lyrics from Genius in a tibble.

Stars: ✭ 111 (-11.9%)

Mutual labels: text-mining

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+1576.19%)

Mutual labels: corpus

View All Similar Projects ➔

KH Coder: for Quantitative Content Analysis or Text Mining

Web

Japanese: http://khcoder.net
English: http://khcoder.net/en

Description

KH Coder is a free software for quantitative content analysis or text mining. It is also utilized for computational linguistics. You can analyze Catalan, Chinese (simplified), Dutch, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Slovenian and Spanish text with KH Coder.

Screenshots: https://goo.gl/photos/ixn1sTM3jm8o11bP8
Official book (in Japanese): http://amzn.to/2wHFxKg

How to run source code of KH Coder on Windows

Download & install Perl: http://strawberryperl.com/
(Fork and) clone this repository
Download released *.exe file (Winzip self-extractor) of KH Coder 3
Unzip the downloaded file into the clone directory
Open command prompt window and go to the clone directory, type "perl kh_coder.pl", and hit "Enter" key

If you get errors like "Can't locate Jcode.pm in @INC", you need to install Perl module called "Jcode". To install it, type "cpanm Jcode" and hit "Enter" key on your command prompt window.

Above procedure is for people who want to develop or modify KH Coder. If you want to just try or use KH Coder, you don't need Perl. Please just download and unzip released *.exe file, then double click extracted "kh_coder.exe".

On Linux or other Un*x like system

You need:

MySQL
Perl (and some Perl modules)
R (and some R packages)
Morphological Analysis and POS Tagging software
- ChaSen or MeCab for analyzing Japanese text
- FreeLing or Stanford POS Tagger for analyzing English text
- FreeLing for analyzing Catalan, French, German, Italian, Portuguese, Russian or Spanish text
- MeCab and HanDic for analyzing Korean text
- Stanford Word Segmenter and Stanford POS Tagger for analyzing Chinese text

See issue #91 for more details.

License

GNU GPL version 2 or later

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 126

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗