All Projects → Khcoder → Similar Projects or Alternatives

254 Open source projects that are alternatives of or similar to Khcoder

trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+464.29%)
Mutual labels:  text-mining, corpus
malay-dataset
Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+50%)
Mutual labels:  text-mining, corpus
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (-3.97%)
Mutual labels:  corpus, text-mining
Friend.ly
A social media platform with a friend recommendation engine based on personality trait extraction
Stars: ✭ 41 (-67.46%)
Mutual labels:  text-mining
Coarij
Corpus of Annual Reports in Japan
Stars: ✭ 55 (-56.35%)
Mutual labels:  corpus
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (-27.78%)
Mutual labels:  text-mining
Datasets
Poetry-related datasets developed by THUAIPoet (Jiuge) group.
Stars: ✭ 111 (-11.9%)
Mutual labels:  corpus
Uc Davis Cs Exams Analysis
📈 Regression and Classification with UC Davis student quiz data and exam data
Stars: ✭ 33 (-73.81%)
Mutual labels:  text-mining
Orange3 Text
🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (-34.13%)
Mutual labels:  text-mining
Nlppln
NLP pipeline software using common workflow language
Stars: ✭ 31 (-75.4%)
Mutual labels:  text-mining
Text Mining
Text Mining in Python
Stars: ✭ 18 (-85.71%)
Mutual labels:  text-mining
Pipeit
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Stars: ✭ 57 (-54.76%)
Mutual labels:  text-mining
Lexicon Thai
คลังศัพท์ภาษาไทย
Stars: ✭ 96 (-23.81%)
Mutual labels:  corpus
Mitie chinese wikipedia corpus
Pre-trained Wikipedia corpus by MITIE
Stars: ✭ 43 (-65.87%)
Mutual labels:  corpus
Colibri Core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (-11.11%)
Mutual labels:  corpus
Tidytext
Text mining using tidy tools ✨📄✨
Stars: ✭ 975 (+673.81%)
Mutual labels:  text-mining
Lexicon
A data package containing lexicons and dictionaries for text analysis
Stars: ✭ 87 (-30.95%)
Mutual labels:  text-mining
Typing Assistant
Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.
Stars: ✭ 32 (-74.6%)
Mutual labels:  corpus
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+1266.67%)
Mutual labels:  text-mining
Lyrics Corpora
An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
Stars: ✭ 13 (-89.68%)
Mutual labels:  corpus
Ja.text8
Japanese text8 corpus for word embedding.
Stars: ✭ 79 (-37.3%)
Mutual labels:  corpus
Pansori
Tools for ASR Corpus Generation from Online Video
Stars: ✭ 106 (-15.87%)
Mutual labels:  corpus
Autophrase
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Stars: ✭ 835 (+562.7%)
Mutual labels:  text-mining
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (-42.86%)
Mutual labels:  text-mining
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+551.59%)
Mutual labels:  corpus
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+526.98%)
Mutual labels:  text-mining
Konlpy
Python package for Korean natural language processing.
Stars: ✭ 1,098 (+771.43%)
Mutual labels:  text-mining
Text predictor
Char-level RNN LSTM text generator📄.
Stars: ✭ 99 (-21.43%)
Mutual labels:  text-mining
Ngram
Fast n-Gram Tokenization
Stars: ✭ 55 (-56.35%)
Mutual labels:  text-mining
Textcluster
短文本聚类预处理模块 Short text cluster
Stars: ✭ 115 (-8.73%)
Mutual labels:  text-mining
Spark Nkp
Natural Korean Processor for Apache Spark
Stars: ✭ 50 (-60.32%)
Mutual labels:  text-mining
Chi Corpus
迟先生语料库
Stars: ✭ 96 (-23.81%)
Mutual labels:  corpus
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-65.87%)
Mutual labels:  text-mining
Keywords2vec
Stars: ✭ 121 (-3.97%)
Mutual labels:  text-mining
Gsoc2018 3gm
💫 Automated codification of Greek Legislation with NLP
Stars: ✭ 36 (-71.43%)
Mutual labels:  text-mining
Pyclue
Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
Stars: ✭ 91 (-27.78%)
Mutual labels:  corpus
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (-73.81%)
Mutual labels:  text-mining
Genius
Easily access song lyrics from Genius in a tibble.
Stars: ✭ 111 (-11.9%)
Mutual labels:  text-mining
Chatterbot Corpus
A multilingual dialog corpus
Stars: ✭ 964 (+665.08%)
Mutual labels:  corpus
R Text Data
List of textual data sources to be used for text mining in R
Stars: ✭ 85 (-32.54%)
Mutual labels:  text-mining
Tidy Text Mining
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
Stars: ✭ 961 (+662.7%)
Mutual labels:  text-mining
Dialog corpus
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+1219.05%)
Mutual labels:  corpus
Spider
A configurable web spider with a easy-to-use web console
Stars: ✭ 954 (+657.14%)
Mutual labels:  text-mining
Dataset List
lists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-33.33%)
Mutual labels:  corpus
Company Names Corpus
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Stars: ✭ 868 (+588.89%)
Mutual labels:  corpus
Ua Gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (-14.29%)
Mutual labels:  corpus
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-85.71%)
Mutual labels:  text-mining
Russian news corpus
Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ
Stars: ✭ 76 (-39.68%)
Mutual labels:  corpus
Naive Bayes Classifier
Naive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.
Stars: ✭ 6 (-95.24%)
Mutual labels:  corpus
Sejong Corpus
Korean sejong corpus download and simple analysis
Stars: ✭ 116 (-7.94%)
Mutual labels:  corpus
Rake Nltk
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Stars: ✭ 793 (+529.37%)
Mutual labels:  text-mining
Blacklab
A corpus retrieval engine based on Apache Lucene
Stars: ✭ 69 (-45.24%)
Mutual labels:  corpus
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (-19.05%)
Mutual labels:  text-mining
Seq2seq Chatbot
Chatbot in 200 lines of code using TensorLayer
Stars: ✭ 777 (+516.67%)
Mutual labels:  corpus
Pyphonetics
A Python 3 phonetics library.
Stars: ✭ 61 (-51.59%)
Mutual labels:  text-mining
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+5182.54%)
Mutual labels:  corpus
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1576.19%)
Mutual labels:  corpus
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-51.59%)
Mutual labels:  text-mining
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+467.46%)
Mutual labels:  text-mining
Cogcomp Nlpy
CogComp's light-weight Python NLP annotators
Stars: ✭ 115 (-8.73%)
Mutual labels:  text-mining
1-60 of 254 similar projects