GitPlanet
Projects
Users
Categories
Languages
About
All Categories
→
No Category
→ corpus-tools
Top 4 corpus-tools open source projects
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
✭ 32
python
nlp
tokenizer
wordlist
lemmatizer
morphological-analysis
lemmatiser
tokenization
lemmatization
corpus-tools
kontext
An advanced, extensible web front-end for the Manatee-open corpus search engine
✭ 50
typescript
python
HTML
javascript
shell
PEG.js
user-interface
corpora
corpus-linguistics
corpus-tools
parallel-corpora-tools
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
✭ 35
PHP
shell
nlp
data-science
natural-language-processing
translation
machine
machine-translation
natural-language
neural-machine-translation
corpora
nmt
filtering
data-processing
neural
language-processing
cleaning
corpus-tools
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
✭ 711
python
nlp
crawler
text-mining
news
html-to-markdown
scraping
corpus
news-aggregator
text-extraction
web-scraping
rss-feed
readability
tei
html2text
news-crawler
corpus-builder
corpus-tools
article-extractor
text-cleaning
text-preprocessing
1-4
of
4
corpus-tools projects