GitPlanet
Projects
Users
Categories
Languages
About
All Git Users
→ adbar
2 open source projects by adbar
[ Open user page on Github ]
1.
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
✭ 32
python
nlp
tokenizer
wordlist
lemmatizer
morphological-analysis
lemmatiser
tokenization
lemmatization
corpus-tools
2.
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
✭ 711
python
nlp
crawler
text-mining
news
html-to-markdown
scraping
corpus
news-aggregator
text-extraction
web-scraping
rss-feed
readability
tei
html2text
news-crawler
corpus-builder
corpus-tools
article-extractor
text-cleaning
text-preprocessing
1-2
of
2
user projects