All Projects → voidful → NLPrep

voidful / NLPrep

Licence: Apache-2.0 License
🍳 NLPrep - dataset tool for many natural language processing task

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to NLPrep

BIRL
BIRL: Benchmark on Image Registration methods with Landmark validations
Stars: ✭ 66 (+153.85%)
Mutual labels:  dataset
corona-virus
一个冠状病毒肺炎传染病学研究数据集
Stars: ✭ 34 (+30.77%)
Mutual labels:  dataset
AITQA
resources for the IBM Airlines Table-Question-Answering Benchmark
Stars: ✭ 12 (-53.85%)
Mutual labels:  dataset
pump-and-dump-dataset
Additional material for paper: Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations, ICCCN '20
Stars: ✭ 66 (+153.85%)
Mutual labels:  dataset
HJDataset
A Large Dataset of Historical Japanese Documents with Complex Layouts
Stars: ✭ 19 (-26.92%)
Mutual labels:  dataset
BugZoo
Keep your bugs contained. A platform for studying historical software bugs.
Stars: ✭ 49 (+88.46%)
Mutual labels:  dataset
dataset-histology-landmarks
Dataset: landmarks for registration of histology images
Stars: ✭ 26 (+0%)
Mutual labels:  dataset
icedata
IceData: Datasets Hub for the *IceVision* Framework
Stars: ✭ 41 (+57.69%)
Mutual labels:  dataset
user quality
Dataset for Software Evolution and Quality Improvement
Stars: ✭ 27 (+3.85%)
Mutual labels:  dataset
Open-korean-corpora
Open Korean NLP Dataset Curation for the Users All Around the Globe
Stars: ✭ 82 (+215.38%)
Mutual labels:  dataset
squad-v1.1-pt
Portuguese translation of the SQuAD dataset
Stars: ✭ 13 (-50%)
Mutual labels:  dataset
MaskedFaceRepresentation
Masked face recognition focuses on identifying people using their facial features while they are wearing masks. We introduce benchmarks on face verification based on masked face images for the development of COVID-safe protocols in airports.
Stars: ✭ 17 (-34.62%)
Mutual labels:  dataset
OTT-QA
Code and Data for ICLR2021 Paper "Open Question Answering over Tables and Text"
Stars: ✭ 92 (+253.85%)
Mutual labels:  dataset
Audio-Classification-using-CNN-MLP
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
Stars: ✭ 36 (+38.46%)
Mutual labels:  dataset
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
Stars: ✭ 26 (+0%)
Mutual labels:  dataset
snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
Stars: ✭ 56 (+115.38%)
Mutual labels:  dataset
covid19-data-greece
Datasets and analysis of Novel Coronavirus (COVID-19) outbreak in Greece
Stars: ✭ 16 (-38.46%)
Mutual labels:  dataset
dbcollection
A collection of popular datasets for deep learning.
Stars: ✭ 26 (+0%)
Mutual labels:  dataset
StrayVisualizer
Visualize Data From Stray Scanner https://keke.dev/blog/2021/03/10/Stray-Scanner.html
Stars: ✭ 30 (+15.38%)
Mutual labels:  dataset
tracing-vs-freehand
Tracing Versus Freehand for Evaluating Computer-Generated Drawings (SIGGRAPH 2021)
Stars: ✭ 21 (-19.23%)
Mutual labels:  dataset



PyPI Download Build Last Commit

Feature

  • handle over 100 dataset
  • generate statistic report about processed dataset
  • support many pre-processing ways
  • Provide a panel for entering your parameters at runtime
  • easy to adapt your own dataset and pre-processing utility

Online Explorer

https://voidful.github.io/NLPrep-Datasets/

Documentation

Learn more from the docs.

Quick Start

Installing via pip

pip install nlprep

get one of the dataset

nlprep --dataset clas_udicstm --outdir sentiment

You can also try nlprep in Google Colab: Google Colab

Overview

$ nlprep
arguments:
  --dataset     which dataset to use     
  --outdir      processed result output directory       
  
optional arguments:
  -h, --help    show this help message and exit
  --util        data preprocessing utility, multiple utility are supported 
  --cachedir    dir for caching raw dataset
  --infile      local dataset path
  --report      generate a html statistics report

Contributing

Thanks for your interest.There are many ways to contribute to this project. Get started here.

License PyPI - License

Icons reference

Icons modify from Darius Dan from www.flaticon.com
Icons modify from Freepik from www.flaticon.com

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].