Alternatives and detailed information of tweets-preprocessor

vasisouv / tweets-preprocessor

Licence: GPL-3.0 License

Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tweets-preprocessor

Semantic-Textual-Similarity

Natural Language Processing using NLTK and Spacy

Stars: ✭ 30 (+15.38%)

Mutual labels: spacy, nltk, spacy-nlp

nlp-cheat-sheet-python

NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition

Stars: ✭ 69 (+165.38%)

Mutual labels: spacy, nltk, spacy-nlp

Practical Machine Learning With Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+7084.62%)

Mutual labels: spacy, nltk

Twitterldatopicmodeling

Uses topic modeling to identify context between follower relationships of Twitter users

Stars: ✭ 48 (+84.62%)

Mutual labels: twitter, nltk

Orange3 Text

🍊 📄 Text Mining add-on for Orange3

Stars: ✭ 83 (+219.23%)

Mutual labels: twitter, nltk

Cltk

The Classical Language Toolkit

Stars: ✭ 650 (+2400%)

Mutual labels: spacy, nltk

Python nlp tutorial

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

Stars: ✭ 72 (+176.92%)

Mutual labels: spacy, nltk

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+4253.85%)

Mutual labels: spacy, nltk

NLP Quickbook

NLP in Python with Deep Learning

Stars: ✭ 516 (+1884.62%)

Mutual labels: spacy, spacy-nlp

nlp workshop odsc europe20

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and T…

Stars: ✭ 127 (+388.46%)

Mutual labels: spacy, nltk

augmenty

Augmenty is an augmentation library based on spaCy for augmenting texts.

Stars: ✭ 101 (+288.46%)

Mutual labels: spacy, spacy-nlp

contextualSpellCheck

✔️Contextual word checker for better suggestions

Stars: ✭ 274 (+953.85%)

Mutual labels: spacy, preprocessing

Stocksight

Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis

Stars: ✭ 1,037 (+3888.46%)

Mutual labels: twitter, nltk

topic modelling financial news

Topic modelling on financial news with Natural Language Processing

Stars: ✭ 51 (+96.15%)

Mutual labels: spacy, nltk

bert-tensorflow-pytorch-spacy-conversion

Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.

Stars: ✭ 26 (+0%)

Mutual labels: spacy, spacy-nlp

turing

✨ 🧬 Turing AI - Semantic Navigation, Chatbot using Search Engine and Many NLP Vendors.

Stars: ✭ 30 (+15.38%)

Mutual labels: spacy, spacy-nlp

text-normalizer

Normalize text string

Stars: ✭ 12 (-53.85%)

Mutual labels: preprocessing

twitter

A serverless social network that's under development with some cool stuff, such as Serverless Framework, AppSync, GraphQL, Lambda, DynamoDB, Cognito, Kinesis Firehose, and Algolia ☁️

Stars: ✭ 29 (+11.54%)

Mutual labels: twitter

remove-Twitter-promotions

Super simple Chrome extension to remove Twitter promotions.

Stars: ✭ 18 (-30.77%)

Mutual labels: twitter

veridical-flow

Making it easier to build stable, trustworthy data-science pipelines.

Stars: ✭ 28 (+7.69%)

Mutual labels: preprocessing

View All Similar Projects ➔

Tweets Preprocessor

The tweets preprocessor module, developed by the AUTH team as part of the PlasticTwist Crowdsourcing module

Installation

The tweets-preprocessor module is not yet available trough PyPI, thus requiring manual import.

$ pip install -r requirements.txt

then execute utils/requirements_installer.py to install additional dependencies automatically.

Usage

The module was developed in a functional way and features a Fluent API. This allows the user to either call individual pre-processing methods or use the full_preprocess method to apply all of the pre-processing methods to his text.

The list of methods that can currently be used are:

remove_urls - Removes all urls (e.g. 'https://ptwist.eu')
remove_mentions - Removes all mentions (e.g. '@PlasticTwistBot')
remove_hashtags - Removes all hashtags (e.g. '#plastictwist')
remove_twitter_reserved_words - Removes Twitter reserved words (e.g. 'RT', 'via')
remove_punctuation - Removes punctuation (e.g. '.', '!')
remove_single_letter_words - Removes single-letter words (e.g. 'b', 'f')
remove_blank_spaces - Removes blank spaces
remove_stopwords - Removes stopwords (e.g. 'a', 'at', 'here')
- has an extra_stopwords parameter (list) that allows users to add extra stopwords
remove_profane_words - Removes profane words
remove_numbers - Removes numbers (e.g. '2', '999')
- has a preserve_years parameter (boolean) that allows users to choose whether or not years should be removed.

Examples

Using specific methods

from twitter_preprocessor import TwitterPreprocessor

p = TwitterPreprocessor('Some @ptwist text to be preprocessed. It contains 2 sentences. Best text 2019!')

p.remove_mentions().remove_punctuation().remove_numbers(preserve_years=True).remove_blank_spaces()
print(p.text)
# 'Some text to be preprocessed It contains sentences Best text 2019'

Full pre-processing

from twitter_preprocessor import TwitterPreprocessor

p = TwitterPreprocessor('RT @ptwist This text contains mentions, urls, some Twitter words and some stopwords to be preprocessed via https://example.com.')

p.fully_preprocess()
print(p.text)
# 'This text contains mentions urls Twitter words stopwords preprocessed'

License

This project is licensed under the GPL 3.0 license.

Credits

Developed and maintained by: vasisouv, alextsil, idimitriadis

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

vasisouv / tweets-preprocessor

Programming Languages

Labels

Projects that are alternatives of or similar to tweets-preprocessor

Tweets Preprocessor

Installation

Usage

Examples

Using specific methods

Full pre-processing

License

Credits