BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

Stars: ✭ 156 (+22.83%)

Mutual labels: data-science, pipeline

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+1089.76%)

Mutual labels: data-science, data-engineering

Targets

Function-oriented Make-like declarative workflows for R

Stars: ✭ 293 (+130.71%)

Mutual labels: data-science, pipeline

Applied Ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Stars: ✭ 17,824 (+13934.65%)

Mutual labels: data-science, data-engineering

Mlbox

MLBox is a powerful Automated Machine Learning python library.

Stars: ✭ 1,199 (+844.09%)

Mutual labels: data-science, pipeline

Auptimizer

An automatic ML model optimization tool.

Stars: ✭ 166 (+30.71%)

Mutual labels: data-science, data-engineering

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (-37.8%)

Mutual labels: data-science, data-engineering

Steppy

Lightweight, Python library for fast and reproducible experimentation 🔬

Stars: ✭ 119 (-6.3%)

Mutual labels: data-science, pipeline

Learn Something Every Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

Stars: ✭ 362 (+185.04%)

Mutual labels: data-science, data-engineering

Pdpipe

Easy pipelines for pandas DataFrames.

Stars: ✭ 590 (+364.57%)

Mutual labels: data-science, pipeline

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+580.31%)

Mutual labels: data-science, data-engineering

Steppy Toolkit

Curated set of transformers that make your work with steppy faster and more effective 🔭

Stars: ✭ 21 (-83.46%)

Mutual labels: data-science, pipeline

D6t Python

Accelerate data science

Stars: ✭ 118 (-7.09%)

Mutual labels: data-science, data-engineering

Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

Stars: ✭ 118 (-7.09%)

Mutual labels: data-science, pipeline

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+19.69%)

Mutual labels: data-science, data-engineering

Gspread Pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Stars: ✭ 226 (+77.95%)

Mutual labels: data-science, data-engineering

Accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.

Stars: ✭ 137 (+7.87%)

Mutual labels: data-science, data-engineering

Drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

Stars: ✭ 1,301 (+924.41%)

Mutual labels: data-science, pipeline

Blurr

Data transformations for the ML era

Stars: ✭ 96 (-24.41%)

Mutual labels: data-science, pipeline

Open Solution Mapping Challenge

Open solution to the Mapping Challenge 🌎

Stars: ✭ 291 (+129.13%)

Mutual labels: data-science, pipeline

Prefect

The easiest way to automate your data

Stars: ✭ 7,956 (+6164.57%)

Mutual labels: data-science, data-engineering

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (-0.79%)

Mutual labels: data-science, data-engineering

Automlpipeline.jl

A package that makes it trivial to create and evaluate machine learning pipeline architectures.

Stars: ✭ 223 (+75.59%)

Mutual labels: data-science, pipeline

Drake Examples

Example workflows for the drake R package

Stars: ✭ 57 (-55.12%)

Mutual labels: data-science, pipeline

Superset

Apache Superset is a Data Visualization and Data Exploration Platform

Stars: ✭ 42,634 (+33470.08%)

Mutual labels: data-science, data-engineering

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+1777.95%)

Mutual labels: data-science, data-engineering

Ml Email Clustering

Email clustering with machine learning

Stars: ✭ 116 (-8.66%)

Mutual labels: data-science

Learn Machine Learning

Learn to Build a Machine Learning Application from Top Articles

Stars: ✭ 116 (-8.66%)

Mutual labels: data-science

Keras Contrib

Keras community contributions

Stars: ✭ 1,532 (+1106.3%)

Mutual labels: data-science

Rightmove webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

Stars: ✭ 125 (-1.57%)

Mutual labels: data-science

Wooey

A Django app that creates automatic web UIs for Python scripts.

Stars: ✭ 1,680 (+1222.83%)

Mutual labels: data-science

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+1093.7%)

Mutual labels: data-science

Modelchimp

Experiment tracking for machine and deep learning projects

Stars: ✭ 121 (-4.72%)

Mutual labels: data-science

Truvisory

This project is meant to provide resources to users who want to access good LinkedIn posts which contains resources to learn any Technology, Design, Self-Branding, Motivation etc. You can visit project by:

Stars: ✭ 116 (-8.66%)

Mutual labels: data-science

Europa

Puppet Container Registry

Stars: ✭ 114 (-10.24%)

Mutual labels: pipeline

Stock Prediction

Smart Algorithms to predict buying and selling of stocks on the basis of Mutual Funds Analysis, Stock Trends Analysis and Prediction, Portfolio Risk Factor, Stock and Finance Market News Sentiment Analysis and Selling profit ratio. Project developed as a part of NSE-FutureTech-Hackathon 2018, Mumbai. Team : Semicolon

Stars: ✭ 125 (-1.57%)

Mutual labels: data-science

Sarek

Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing

Stars: ✭ 124 (-2.36%)

Mutual labels: pipeline

Pandas Videos

Jupyter notebook and datasets from the pandas Q&A video series

Stars: ✭ 1,716 (+1251.18%)

Mutual labels: data-science

Scipy 2017 Cython Tutorial

Material for the SciPy 2017 Cython tutorial

Stars: ✭ 114 (-10.24%)

Mutual labels: data-science

Seaborn Tutorial

This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.

Stars: ✭ 114 (-10.24%)

Mutual labels: data-science

Variety

A schema analyzer for MongoDB

Stars: ✭ 1,592 (+1153.54%)

Mutual labels: data-science

Mlr

Machine Learning in R

Stars: ✭ 1,542 (+1114.17%)

Mutual labels: data-science

River

🌊 Online machine learning in Python