Alternatives and detailed information of ptype

TuringDataStories: An open community creating “Data Stories”: A mix of open data, code, narrative 💬, visuals 📊📈 and knowledge 🧠 to help understand the world around us.

Stars: ✭ 27 (+8%)

Mutual labels: hut23

AIrsenal

Machine learning Fantasy Premier League team

Stars: ✭ 140 (+460%)

Mutual labels: hut23

1 Introduction

Contents

1 Introduction
2 Install requirements
3 Usage

ptype is a probabilistic approach to type inference, which is the task of identifying the data type (e.g. Boolean, date, integer or string) of a given column of data.

Existing approaches often fail on type inference for messy datasets where data is missing or anomalous. With ptype, our goal is to develop a robust method that can deal with such data.

https://raw.githubusercontent.com/alan-turing-institute/ptype/release/notes/motivation.png

Normal, missing and anomalous values are denoted by green, yellow and red, respectively in the right hand figure.

ptype uses Probabilistic Finite-State Machines (PFSMs) to model known data types, missing and anomalous data. Given a column of data, we can infer a plausible column type, and also identify any values which (conditional on that type) are deemed missing or anomalous. In contrast to more familiar finite-state machines, such as regular expressions, that either accept or reject a given data value, PFSMs assign probabilities to different values. They therefore offer the advantage of generating weighted predictions when a column of messy data is consistent with more than one type assignment.

If you use this package, please cite the ptype paper, using the following BibTeX entry:

@article{ceritli2020ptype,
  title={ptype: probabilistic type inference},
  author={Ceritli, Taha and Williams, Christopher KI and Geddes, James},
  journal={Data Mining and Knowledge Discovery},
  year={2020},
  volume = {34},
  number = {3},
  pages={870–-904},
  doi = {10.1007/s10618-020-00680-1},
}

2 Install requirements

You can simply install ptype from PyPI:

pip install ptype

3 Usage

See demo notebooks in notebooks folder. View them online via Binder.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

alan-turing-institute / ptype

Programming Languages

Labels

Projects that are alternatives of or similar to ptype

1 Introduction

2 Install requirements

3 Usage