All Projects → arkhn → pagai

arkhn / pagai

Licence: Apache-2.0 license
Tools to suggest SQL columns for Pyrog

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to pagai

catordog
这是一个基于tensorflow和python的猫狗分类算法
Stars: ✭ 20 (-4.76%)
Mutual labels:  classification
classification
Catalyst.Classification
Stars: ✭ 35 (+66.67%)
Mutual labels:  classification
10 days of deep learning
10 days 10 different practical applications of Deep Learning (primarily NLP) using Tensorflow and Keras
Stars: ✭ 28 (+33.33%)
Mutual labels:  classification
ML4K-AI-Extension
Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.
Stars: ✭ 18 (-14.29%)
Mutual labels:  classification
ml-workflow-automation
Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.
Stars: ✭ 44 (+109.52%)
Mutual labels:  classification
YOLOv1 tensorflow
YOLOv1 tensorflow
Stars: ✭ 14 (-33.33%)
Mutual labels:  classification
machine learning from scratch matlab python
Vectorized Machine Learning in Python 🐍 From Scratch
Stars: ✭ 28 (+33.33%)
Mutual labels:  classification
egfr-att
Drug effect prediction using neural network
Stars: ✭ 17 (-19.05%)
Mutual labels:  classification
Point2Sequence
Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network
Stars: ✭ 34 (+61.9%)
Mutual labels:  classification
supervised-machine-learning
This repo contains regression and classification projects. Examples: development of predictive models for comments on social media websites; building classifiers to predict outcomes in sports competitions; churn analysis; prediction of clicks on online ads; analysis of the opioids crisis and an analysis of retail store expansion strategies using…
Stars: ✭ 34 (+61.9%)
Mutual labels:  classification
ros tensorflow
This repo introduces how to integrate Tensorflow framework into ROS with object detection API.
Stars: ✭ 39 (+85.71%)
Mutual labels:  classification
simpleAICV-pytorch-ImageNet-COCO-training
SimpleAICV:pytorch training example on ImageNet(ILSVRC2012)/COCO2017/VOC2007+2012 datasets.Include ResNet/DarkNet/RetinaNet/FCOS/CenterNet/TTFNet/YOLOv3/YOLOv4/YOLOv5/YOLOX.
Stars: ✭ 276 (+1214.29%)
Mutual labels:  classification
immuneML
immuneML is a platform for machine learning analysis of adaptive immune receptor repertoire data.
Stars: ✭ 41 (+95.24%)
Mutual labels:  classification
auditor
Model verification, validation, and error analysis
Stars: ✭ 56 (+166.67%)
Mutual labels:  classification
NIPS-Global-Paper-Implementation-Challenge
Selective Classification For Deep Neural Networks.
Stars: ✭ 11 (-47.62%)
Mutual labels:  classification
knodle
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.
Stars: ✭ 76 (+261.9%)
Mutual labels:  classification
Fraud-Detection-in-Online-Transactions
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting
Stars: ✭ 41 (+95.24%)
Mutual labels:  classification
PSCN
A python implementation of Patchy-San Convolutional Network for Graph
Stars: ✭ 39 (+85.71%)
Mutual labels:  classification
ssj
Social Signal Processing for Android
Stars: ✭ 24 (+14.29%)
Mutual labels:  classification
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+2966.67%)
Mutual labels:  classification

Pagai: dive into your data pool

Arkhn Build Status GitHub license

Pagai is a SQL database inspection tool implemented in Python. In particular, it is used to find joins between tables, determine column types (first name, medical code, etc) rank columns in certain contexts.

This projet is tightly linked to Pyrog, which serves as its web client.

A staging version of Pagai is available through its web client here: https://pyrog.staging.arkhn.org

Get started

  • Set up and start your virtualenv
  • Launch the server: FLASK_RUN_PORT=4000 FLASK_APP=pagai/app flask run
  • Visit http://localhost:4000/init/<database_name> to start database analysis

The concept

The pagai combines two advanced tools:

Functional type inference of columns

The column classifier is a generic machine learning model which is used to determine the information contained in a column, that we call the functional type. Example of types are firstname, name, address, city, id, date, and code (like M/F genders). Our approach is to build sufficiently robust models that consider all this types as distinct but equivalent classes: this means that we won't provide a regex to extract a date for instance. Moreover, we focus on the column scale and not on the single item scale. This helps us to make the most of column data distributions and statistical signature of different types.

The strength of the classifier is that it can run different ML models under the hood. The current model is a RandomClassifier based on enhanced ngrams, but we're building a RNN-based model as well.

The dependency graph builder

The dependency graph builder finds links between tables within a database, based on potential joins that could occur. This tool helps understanding which tables are linked with each other, like for example a table with patients and another with patient contact persons in case of emergency.

The query functionality

Merging this two tools is the real strength of the pagai project. Indeed it is possible to query the database with the engine to retrieve interesting columns. The search is based on a score allocated to each column, which depends on the relevance of the functional type and the distance in the dependency graph to the table we're considering (like patient for example). In addition, we have also added a fuzzy matching algorithm on the table and column name to update the score and return the most relevant columns.

With this, we can answer questions like:

  • "Give me the date of birth of patient"

    Api call api/search/date/patient/birth

  • "Give me the relation type between the patient and its contact person (husband, child, etc)"

    Api call api/search/code/patient/relation

NB: We mean the location of the column which has this information

The api syntax is explained in the next section.

The api

To make this tool easy to use for the largest number of people, with have built an api with the following structure:

api/search/<functional_type>/<reference_table>/<keyword_column>

Parameters:

  • functional_type: firstname, name, address, city, id, date, code and those of your imagination
  • reference_table: the reference table for the dependency graph (patient in the examples above)
  • keyword_column: keywords to match column or table names with fuzzy matching (ex: birth-> birthdate)

Getting started and building my customized pagai

As for now, we're training our engine on a simplified version of the MIMIC dataset extended with firstname, name and address data.

Of course, it is possible to train the model and the graph with your own database. In particular, you can provide whatever functional type you want (you could add phone in the list above for example). We'll provide shortly instructions explaining how to proceed.

Feel free to contact us on Slack in you have trouble with the project.

If you're enthusiastic about our project, it to show your support! ❤️


Dev

Run locally

PYTHONPATH=. python pagai/app.py

Docker build

docker build -t arkhn/pagai:latest . # build the regular pagai image

License

Apache License 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].