All Projects → vasisouv → tweets-preprocessor

vasisouv / tweets-preprocessor

Licence: GPL-3.0 License
Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tweets-preprocessor

Semantic-Textual-Similarity
Natural Language Processing using NLTK and Spacy
Stars: ✭ 30 (+15.38%)
Mutual labels:  spacy, nltk, spacy-nlp
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (+165.38%)
Mutual labels:  spacy, nltk, spacy-nlp
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+7084.62%)
Mutual labels:  spacy, nltk
Twitterldatopicmodeling
Uses topic modeling to identify context between follower relationships of Twitter users
Stars: ✭ 48 (+84.62%)
Mutual labels:  twitter, nltk
Orange3 Text
🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (+219.23%)
Mutual labels:  twitter, nltk
Cltk
The Classical Language Toolkit
Stars: ✭ 650 (+2400%)
Mutual labels:  spacy, nltk
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (+176.92%)
Mutual labels:  spacy, nltk
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+4253.85%)
Mutual labels:  spacy, nltk
NLP Quickbook
NLP in Python with Deep Learning
Stars: ✭ 516 (+1884.62%)
Mutual labels:  spacy, spacy-nlp
nlp workshop odsc europe20
Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and T…
Stars: ✭ 127 (+388.46%)
Mutual labels:  spacy, nltk
augmenty
Augmenty is an augmentation library based on spaCy for augmenting texts.
Stars: ✭ 101 (+288.46%)
Mutual labels:  spacy, spacy-nlp
contextualSpellCheck
✔️Contextual word checker for better suggestions
Stars: ✭ 274 (+953.85%)
Mutual labels:  spacy, preprocessing
Stocksight
Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis
Stars: ✭ 1,037 (+3888.46%)
Mutual labels:  twitter, nltk
topic modelling financial news
Topic modelling on financial news with Natural Language Processing
Stars: ✭ 51 (+96.15%)
Mutual labels:  spacy, nltk
bert-tensorflow-pytorch-spacy-conversion
Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.
Stars: ✭ 26 (+0%)
Mutual labels:  spacy, spacy-nlp
turing
✨ 🧬 Turing AI - Semantic Navigation, Chatbot using Search Engine and Many NLP Vendors.
Stars: ✭ 30 (+15.38%)
Mutual labels:  spacy, spacy-nlp
text-normalizer
Normalize text string
Stars: ✭ 12 (-53.85%)
Mutual labels:  preprocessing
twitter
A serverless social network that's under development with some cool stuff, such as Serverless Framework, AppSync, GraphQL, Lambda, DynamoDB, Cognito, Kinesis Firehose, and Algolia ☁️
Stars: ✭ 29 (+11.54%)
Mutual labels:  twitter
remove-Twitter-promotions
Super simple Chrome extension to remove Twitter promotions.
Stars: ✭ 18 (-30.77%)
Mutual labels:  twitter
veridical-flow
Making it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (+7.69%)
Mutual labels:  preprocessing

Tweets Preprocessor




The tweets preprocessor module, developed by the AUTH team as part of the PlasticTwist Crowdsourcing module

PRs Welcome

Installation

The tweets-preprocessor module is not yet available trough PyPI, thus requiring manual import.

$ pip install -r requirements.txt

then execute utils/requirements_installer.py to install additional dependencies automatically.

Usage

The module was developed in a functional way and features a Fluent API. This allows the user to either call individual pre-processing methods or use the full_preprocess method to apply all of the pre-processing methods to his text.

The list of methods that can currently be used are:

  • remove_urls - Removes all urls (e.g. 'https://ptwist.eu')
  • remove_mentions - Removes all mentions (e.g. '@PlasticTwistBot')
  • remove_hashtags - Removes all hashtags (e.g. '#plastictwist')
  • remove_twitter_reserved_words - Removes Twitter reserved words (e.g. 'RT', 'via')
  • remove_punctuation - Removes punctuation (e.g. '.', '!')
  • remove_single_letter_words - Removes single-letter words (e.g. 'b', 'f')
  • remove_blank_spaces - Removes blank spaces
  • remove_stopwords - Removes stopwords (e.g. 'a', 'at', 'here')
    • has an extra_stopwords parameter (list) that allows users to add extra stopwords
  • remove_profane_words - Removes profane words
  • remove_numbers - Removes numbers (e.g. '2', '999')
    • has a preserve_years parameter (boolean) that allows users to choose whether or not years should be removed.

Examples

Using specific methods

from twitter_preprocessor import TwitterPreprocessor

p = TwitterPreprocessor('Some @ptwist text to be preprocessed. It contains 2 sentences. Best text 2019!')

p.remove_mentions().remove_punctuation().remove_numbers(preserve_years=True).remove_blank_spaces()
print(p.text)
# 'Some text to be preprocessed It contains sentences Best text 2019'

Full pre-processing

from twitter_preprocessor import TwitterPreprocessor

p = TwitterPreprocessor('RT @ptwist This text contains mentions, urls, some Twitter words and some stopwords to be preprocessed via https://example.com.')

p.fully_preprocess()
print(p.text)
# 'This text contains mentions urls Twitter words stopwords preprocessed'

License

This project is licensed under the GPL 3.0 license.

Credits

Developed and maintained by: vasisouv, alextsil, idimitriadis

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].