All Projects → amansrivastava17 → text-preprocess-python

amansrivastava17 / text-preprocess-python

Licence: other
Text preprocessing tools in python.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to text-preprocess-python

Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+463.64%)
Mutual labels:  text-processing, nlp-machine-learning
Text classification
Text Classification Algorithms: A Survey
Stars: ✭ 1,276 (+5700%)
Mutual labels:  text-processing, nlp-machine-learning
lingua-go
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+3009.09%)
Mutual labels:  text-processing, nlp-machine-learning
knime-textprocessing
KNIME - Text Processing Extension (Labs)
Stars: ✭ 17 (-22.73%)
Mutual labels:  text-processing, nlp-machine-learning
RadiologyReportEmbedding
Intelligent Word Embeddings of Free-Text Radiology Reports
Stars: ✭ 22 (+0%)
Mutual labels:  nlp-machine-learning
pytorch-translm
An implementation of transformer-based language model for sentence rewriting tasks such as summarization, simplification, and grammatical error correction.
Stars: ✭ 22 (+0%)
Mutual labels:  nlp-machine-learning
Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
Stars: ✭ 20 (-9.09%)
Mutual labels:  nlp-machine-learning
vnla
Code accompanying the CVPR 2019 paper: https://arxiv.org/abs/1812.04155
Stars: ✭ 60 (+172.73%)
Mutual labels:  nlp-machine-learning
CVAE Dial
CVAE_XGate model in paper "Xu, Dusek, Konstas, Rieser. Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity"
Stars: ✭ 16 (-27.27%)
Mutual labels:  nlp-machine-learning
embeddings
Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polish Language
Stars: ✭ 27 (+22.73%)
Mutual labels:  nlp-machine-learning
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+218.18%)
Mutual labels:  text-processing
Machine-learning
This repository will contain all the stuffs required for beginners in ML and DL do follow and star this repo for regular updates
Stars: ✭ 27 (+22.73%)
Mutual labels:  nlp-machine-learning
TextFeatureSelection
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Stars: ✭ 42 (+90.91%)
Mutual labels:  nlp-machine-learning
vlainic.github.io
My GitHub blog: things you might be interested, and probably not...
Stars: ✭ 26 (+18.18%)
Mutual labels:  nlp-machine-learning
Quora question pairs NLP Kaggle
Quora Kaggle Competition : Natural Language Processing using word2vec embeddings, scikit-learn and xgboost for training
Stars: ✭ 17 (-22.73%)
Mutual labels:  nlp-machine-learning
finglish
A Finglish to Persian converter.
Stars: ✭ 60 (+172.73%)
Mutual labels:  text-processing
sova-tts-tps
NLP-preprocessor for the SOVA-TTS project
Stars: ✭ 44 (+100%)
Mutual labels:  text-processing
SuffixTree
Optimized implementation of suffix tree in python using Ukkonen's algorithm.
Stars: ✭ 38 (+72.73%)
Mutual labels:  text-processing
nlp classification workshop
NLP Classification Workshop
Stars: ✭ 22 (+0%)
Mutual labels:  nlp-machine-learning
arabic-tagger
AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training
Stars: ✭ 38 (+72.73%)
Mutual labels:  nlp-machine-learning

Text Preprocessing Tool

Here is a text preprocessing tool designed in python to save you from preprocessing tasks that you need to completed before performing any NLP task.

List of methods

  • appos_look_up : Convert apostrophes word to original form
    • Example : I don't know what is going on => I do not know what is going on?
  • remove_repeated_characters : Remove repeated characters (>2) in words to max limit of 2
    • Example: I am verrry happpyyy today => I am verry happyy today
  • separate_digit_text : Splits alphanumeric words into digits and text.
    • Example: I will be booking tickets for 2adults => I will be booking tickets for 2 adults
  • slang_look_up : Replace slang word in text to their original form
    • Example: hi, thanq so mch => hi, thank you so much
  • stem_text : Convert words in text into their root form
    • Example: I am playing in ground => I am play in ground
  • remove_single_char_word: Remove single character word from text
    • Example: I am in a home for 2 years => am in home for
  • remove_punctuations: Removed special characters from text
    • Example: he: I am going. are you coming? => he I am going. are you coming
  • remove_extra_space: Remove extra white spaces space from text
    • Example: hey are you coming. ? => he are you coming. ?
  • replace_digits_with_char: Replace digits to replace_char
    • Example: I will be there on 22 april. => I will be there on dd april.
  • emoticons_look_up: Remove emoticons from text and returns list of emotions present in text
    • Example: Sure, you are welcome :) => Sure, you are welcome.
  • remove_url: Remove urls from text
    • Example: link to latest cricket score. https://xyz.com/a/b => link to latest cricket score.
  • remove_alphanumerics: Remove alphanumeric words from text
    • Example: hello man whatsup123 => hello man
  • remove_words_start_with: Remove words start with character starts_with_char
    • Example: dhoni rocks with last ball six #dhoni #six => dhoni rocks with last ball six (start_char_with='#')
  • remove_stop_words: This function removes stop words from text
    • Example: I am very excited for today's fotball match => very excited today's fotball match

Note: Anyone can contribute to the project by adding more to the preprocessing module Thanks :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].