All Projects → AllenDang → Pipeit

AllenDang / Pipeit

Licence: mit
PipeIt is a text transformation, conversion, cleansing and extraction tool.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Pipeit

perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+5.26%)
Mutual labels:  text-mining, text-processing
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+59.65%)
Mutual labels:  text-mining, text-processing
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-21.05%)
Mutual labels:  text-mining, text-processing
Text Mining
Text Mining in Python
Stars: ✭ 18 (-68.42%)
Mutual labels:  text-mining, text-processing
advanced-text-mining
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Stars: ✭ 15 (-73.68%)
Mutual labels:  text-mining, text-processing
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+159.65%)
Mutual labels:  text-mining, text-processing
estratto
parsing fixed width files content made easy
Stars: ✭ 12 (-78.95%)
Mutual labels:  text-mining, text-processing
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-71.93%)
Mutual labels:  text-mining, text-processing
TextDatasetCleaner
🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-52.63%)
Mutual labels:  text-mining, text-processing
TRUNAJOD2.0
An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (-68.42%)
Mutual labels:  text-mining, text-processing
Cogcomp Nlpy
CogComp's light-weight Python NLP annotators
Stars: ✭ 115 (+101.75%)
Mutual labels:  text-mining, text-processing
Text-Analysis
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-15.79%)
Mutual labels:  text-mining, text-processing
Textcluster
短文本聚类预处理模块 Short text cluster
Stars: ✭ 115 (+101.75%)
Mutual labels:  text-mining, text-processing
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-78.95%)
Mutual labels:  text-mining, text-processing
Applied Text Mining In Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan
Stars: ✭ 59 (+3.51%)
Mutual labels:  text-mining, text-processing
deduce
Deduce: de-identification method for Dutch medical text
Stars: ✭ 40 (-29.82%)
Mutual labels:  text-mining, text-processing
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+149.12%)
Mutual labels:  text-mining, text-processing
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+510.53%)
Mutual labels:  text-mining, text-processing
Gsoc2018 3gm
💫 Automated codification of Greek Legislation with NLP
Stars: ✭ 36 (-36.84%)
Mutual labels:  text-mining
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-59.65%)
Mutual labels:  text-processing

PipeIt

PipeIt is a text transformation, conversion, cleansing and extraction tool.

PipeIt screen shot1

Features

  • Split - split text to text array by given separator.
  • RegexpSplit - split text to text array by given regexp expression.
  • Fields - Fields splits the string s around each instance of one or more consecutive white space characters.
  • Match - filter text array by regexp.
  • Replace - replace each element of a text array.
  • Surround - add prefix or suffix to each lement of a text array.
  • Trim - Trim returns a slice of the string s with all leading and trailing Unicode code points contained in cutset removed.
  • Join - join text array to single line of text by given separator.
  • Line - output text array line by line.

And more pipes are comming...

(More important, tell me your case will help me to create more pipes which will actually useful.)

PipeIt also supports to read from Stdin, so you could pipe data using "cat file | PipeIt".

Usage

Extract image links from a html source

PipeIt demo to find image urls from html

Add single quotation mark to every words

PipeIt demo to add single quotation

Replace the comma separated string to lines

PipeIt demo to replace comma

The reason for creating it

First of all, to test the GUI framework created by me, giu, for a real project.

It turns out giu is really useful for this kind of application. It just costs me 6 hours to build it from ground.

And I have this idea for years, to create a text process pipeline, to ease my daily text processing pain.

Hope it could be useful to you as well. :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].