All Projects → A3Data → Hermione

A3Data / Hermione

Licence: apache-2.0
ML made simple

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Hermione

Data Science For Marketing Analytics
Achieve your marketing goals with the data analytics power of Python
Stars: ✭ 127 (-5.93%)
Mutual labels:  data-science
Ds Ai Tech Notes
📖 [译] 数据科学和人工智能技术笔记
Stars: ✭ 131 (-2.96%)
Mutual labels:  data-science
Automl alex
State-of-the art Automated Machine Learning python library for Tabular Data
Stars: ✭ 132 (-2.22%)
Mutual labels:  data-science
Lifelines
Survival analysis in Python
Stars: ✭ 1,766 (+1208.15%)
Mutual labels:  data-science
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+1288.15%)
Mutual labels:  data-science
Awesome Datascience Colleges
A list of colleges and universities offering degrees in data science.
Stars: ✭ 131 (-2.96%)
Mutual labels:  data-science
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-5.93%)
Mutual labels:  data-science
Accelerators
Data science and AI solution accelerator suite that provides templates for prototyping, reporting, and presenting data science analytics of specific domains
Stars: ✭ 134 (-0.74%)
Mutual labels:  data-science
Stats337
Readings in applied data science
Stars: ✭ 1,625 (+1103.7%)
Mutual labels:  data-science
Pecan
The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
Stars: ✭ 132 (-2.22%)
Mutual labels:  data-science
Dtale Desktop
Build a data visualization dashboard with simple snippets of python code
Stars: ✭ 128 (-5.19%)
Mutual labels:  data-science
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-5.19%)
Mutual labels:  data-science
Rpy2
Interface to use R from Python
Stars: ✭ 132 (-2.22%)
Mutual labels:  data-science
Awesome Scientific Python
A curated list of awesome scientific Python resources
Stars: ✭ 127 (-5.93%)
Mutual labels:  data-science
Tntorch
Tensor Network Learning with PyTorch
Stars: ✭ 133 (-1.48%)
Mutual labels:  data-science
Unicode plot.rb
Plot your data by Unicode characters
Stars: ✭ 127 (-5.93%)
Mutual labels:  data-science
Datascicomp
A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.
Stars: ✭ 1,636 (+1111.85%)
Mutual labels:  data-science
Blockchain2graph
Blockchain2graph extracts blockchain data (bitcoin) and insert them into a graph database (neo4j).
Stars: ✭ 134 (-0.74%)
Mutual labels:  data-science
Datasciencer
a curated list of R tutorials for Data Science, NLP and Machine Learning
Stars: ✭ 1,727 (+1179.26%)
Mutual labels:  data-science
Seq2seq tutorial
Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"
Stars: ✭ 132 (-2.22%)
Mutual labels:  data-science

hermione

PyPI version fury.io Hermione License GitHub issues GitHub issues-closed PyPI status PyPI pyversions PyPi downloads

forthebadge made-with-python

A Data Science Project struture in cookiecutter style.

Developed with ❤️ by A3Data

What is Hermione?

Hermione is the newest open source library that will help Data Scientists on setting up more organized codes, in a quicker and simpler way. Besides, there are some classes in Hermione which assist with daily tasks such as: column normalization and denormalization, data view, text vectoring, etc. Using Hermione, all you need is to execute a method and the rest is up to her, just like magic.

Why Hermione?

To bring in a little of A3Data experience, we work in Data Science teams inside several client companies and it’s undeniable the excellence of notebooks as a data exploration tool. Nevertheless, when it comes to data science products and their context, when the models needs to be consumed, monitored and have periodic maintenance, putting it into production inside a Jupyter Notebook is not the best choice (we are not even mentioning memory and CPU performance yet). And that’s why Hermione comes in! We have been inspired by this brilliant, empowered and awesome witch of The Harry Potter saga to name this framework!

This is also our way of reinforcing our position that women should be taking more leading roles in the technology field. #CodeLikeAGirl

Installing

Dependencies

  • Python (>= 3.6)
  • docker

Hermione does not depend on conda to build and manage virtual environments anymore. It uses venv instead.

Install

pip install -U hermione-ml

How do I use Hermione?

After installed Hermione:

  1. Create you new project:
hermione new project_hermione
  1. Hit Enter if you want to start with an example code
Do you want to start with an implemented example (recommended) [y/n]? [y]: 
  1. Hermione already creates a virtual environment for the project. For Windows users, activate it with
<project_name>_env\Scripts\activate

For linux and MacOS users, do

source <project_name>_env/bin/activate
  1. After activating, you should install some libraries. There are a few suggestions in “requirements.txt” file:
pip install -r requirements.txt
  1. Now we will train some models from the example, using MLflow ❤. To do so, inside src directory, just type: hermione train. The “hermione train” command will search for a train.py file and execute it. In the example, models and metrics are already controlled via MLflow.

  1. After that, a mlflow experiment is created. To verify the experiment in mlflow, type: mlflow ui. The application will go up.
mlflow ui
[2020-10-19 23:23:12 -0300] [15676] [INFO] Starting gunicorn 19.10.0
[2020-10-19 23:23:12 -0300] [15676] [INFO] Listening at: http://127.0.0.1:5000 (15676)
[2020-10-19 23:23:12 -0300] [15676] [INFO] Using worker: sync
[2020-10-19 23:23:12 -0300] [15678] [INFO] Booting worker with pid: 15678
  1. To access the experiment, just enter the path previously provided in your preferred browser. Then it is possible to check the trained models and their metrics.

  1. To make batch predictions using your predict.py file, type hermione predict. The default implemented version will print some predictions for you in the terminal.
hermione predict
  1. In the Titanic example, we also provide a step by step notebook. To view it, just type jupyter notebook inside directory /src/notebooks/.

Do you want to create your project from scratch? There click here to check a tutorial.

Docker

Hermione comes with a default Dockerfile which implements a Flask + Gunicorn API that serves your ML model. You should take a look at the api/app.py module and rewrite predict_new() function as you see fit.

Also, in the newest version, hermione brings two CLI commands that helps us abstract a little bit the complexity regarding docker commands. To build an image (remember you should have docker installed), you should be in the project's root directory. Than, do:

hermione build <IMAGE_NAME>

After you have built you're docker image, run it with:

hermione run <IMAGE_NAME>
[2020-10-20 02:13:20 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-10-20 02:13:20 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-10-20 02:13:20 +0000] [1] [INFO] Using worker: sync
[2020-10-20 02:13:20 +0000] [7] [INFO] Booting worker with pid: 7
[2020-10-20 02:13:20 +0000] [8] [INFO] Booting worker with pid: 8
[2020-10-20 02:13:20 +0000] [16] [INFO] Booting worker with pid: 16

THAT IS IT! You have a live model up and running. To test your API, hermione provides a api/myrequests.py module. This is not part of the project; it's a "ready to go" code to make requests to the API. Help yourself!

cd src/api
python myrequests.py
Sending request for model...
Data: {"Pclass": [3, 2, 1], "Sex": ["male", "female", "male"], "Age": [4, 22, 28]}
Response: "[0.24630952 0.996      0.50678968]"

Play a little with the 'fake' data and see how far can the predictions go.

Documentation

This is the class structure diagram that Hermione relies on:

Here we describe briefly what each class is doing:

Data Source

  • DataBase - should be used when data recovery requires a connection to a database. Contains methods for opening and closing a connection.
  • Spreadsheet - should be used when data recovery is in spreadsheets/text files. All aggregation of the bases to generate a "flat table" should be performed in this class.
  • DataSource - abstract class which DataBase and Spreadsheet inherit from.

Preprocessing

  • Preprocessing - concentrates all preprocessing steps that must be performed on the data before the model is trained.
  • Normalization - applies normalization and denormalization to reported columns. This class contains the following normalization algorithms already implemented: StandardScaler e MinMaxScaler.
  • TextVectorizer - transforms text into vector. Implemented methods: Bag of words, TF_IDF, Embedding: mean, median e indexing.

Visualization

  • Visualization - methods for data visualization. There are methods to make static and interactive plots.
  • App Streamlit - streamlit example consuming Titanic dataset, including pandas profilling.

Model

  • Trainer - module that centralizes training algorithms classes. Algorithms from scikit-learn library, for instance, can be easily used with the TrainerSklearn implemented class.
  • Wrapper - centralizes the trained model with its metrics. This class has built-in integration with MLFlow.
  • Metrics - it contains key metrics that are calculated when models are trained. Classification, regression and clustering metrics are already implemented.

Tests

  • test_project - module for unit testing.

Contributing

Make a pull request with your implementation.

For suggestions, contact us: [email protected]

Licence

Hermione is open source and has Apache 2.0 License: License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].