Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+40729.63%)

Mutual labels: spark, pandas

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+1725.93%)

Mutual labels: spark, data-wrangling

Expand

DevExpress XAF extension framework. 𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗲𝘅𝗽𝗮𝗻𝗱𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸.𝗰𝗼𝗺, 𝘆𝗼𝘂𝘁𝘂𝗯𝗲.𝗲𝘅𝗽𝗮𝗻𝗱𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸.𝗰𝗼𝗺 and 𝘁𝘄𝗶𝘁𝘁𝗲𝗿 @𝗲𝘅𝗽𝗮𝗻𝗱𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 and or simply 𝗦𝘁𝗮𝗿/𝘄𝗮𝘁𝗰𝗵 this repository and get notified from 𝗚𝗶𝘁𝗛𝘂𝗯

Stars: ✭ 158 (+192.59%)

Mutual labels: workflow, business-intelligence

Data-Wrangling-with-Python

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

Stars: ✭ 90 (+66.67%)

Mutual labels: pandas, data-wrangling

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+1033.33%)

Mutual labels: pandas, feature-engineering

Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Stars: ✭ 31 (-42.59%)

Mutual labels: pandas, data-wrangling

Data-Science-101

Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.

Stars: ✭ 19 (-64.81%)

Mutual labels: pandas, data-wrangling

Mining

Business Intelligence (BI) in Python, OLAP

Stars: ✭ 1,128 (+1988.89%)

Mutual labels: business-intelligence, olap

OLAP-cube

is an hypercube of data

Stars: ✭ 23 (-57.41%)

Mutual labels: business-intelligence, olap

The-Data-Visualization-Workshop

A New, Interactive Approach to Learning Data Visualization

Stars: ✭ 59 (+9.26%)

Mutual labels: pandas, data-wrangling

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+461.11%)

Mutual labels: spark, pandas

Guitar

A Simple and Efficient Distributed Multidimensional BI Analysis Engine.

Stars: ✭ 86 (+59.26%)

Mutual labels: business-intelligence, olap

Data Forge Ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Stars: ✭ 967 (+1690.74%)

Mutual labels: pandas, data-wrangling

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (+33.33%)

Mutual labels: workflow, spark

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+2112.96%)

Mutual labels: workflow, spark

whyqd

data wrangling simplicity, complete audit transparency, and at speed

Stars: ✭ 16 (-70.37%)

Mutual labels: pandas, data-wrangling

optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Stars: ✭ 1,351 (+2401.85%)

Mutual labels: data-wrangling, data-preparation

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-68.52%)

Mutual labels: data-preprocessing, data-processing

Market-Mix-Modeling

Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales

Stars: ✭ 31 (-42.59%)

Mutual labels: feature-engineering, data-preparation

spark-druid-olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 286 (+429.63%)

Mutual labels: spark, business-intelligence

Transmogrifai

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+3759.26%)

Mutual labels: spark, feature-engineering

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+192.59%)

Mutual labels: spark, pandas

Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Stars: ✭ 157 (+190.74%)

Mutual labels: workflow, feature-engineering

xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

Stars: ✭ 21 (-61.11%)

Mutual labels: data-wrangling, data-preprocessing

veridical-flow

Making it easier to build stable, trustworthy data-science pipelines.

Stars: ✭ 28 (-48.15%)

Mutual labels: workflow, pandas

Retentioneering Tools

Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python. Opensource analytics, predictive analytics over clickstream, sentiment analysis, AB tests, machine learning, and Monte Carlo Markov Chain simulations, extending Pandas, Networkx and sklearn.

Stars: ✭ 291 (+438.89%)

Mutual labels: pandas, business-intelligence

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+5537.04%)

Mutual labels: spark, pandas

Spark Druid Olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 282 (+422.22%)

Mutual labels: spark, business-intelligence

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+2918.52%)

Mutual labels: spark, pandas

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Stars: ✭ 20,147 (+37209.26%)

Mutual labels: spark, business-intelligence

Data Forge Js

JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Stars: ✭ 139 (+157.41%)

Mutual labels: pandas, data-wrangling

visions

Type System for Data Analysis in Python

Stars: ✭ 136 (+151.85%)

Mutual labels: spark, pandas

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (+1.85%)

Mutual labels: spark, data-processing

Distributed Dataset

A distributed data processing framework in Haskell.

Stars: ✭ 108 (+100%)

Mutual labels: spark, data-processing

SumStatsRehab

GWAS summary statistics files QC tool

Stars: ✭ 19 (-64.81%)

Mutual labels: data-preprocessing, data-preparation

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (+172.22%)

Mutual labels: spark, pandas

pandas-workshop

An introductory workshop on pandas with notebooks and exercises for following along.

Stars: ✭ 161 (+198.15%)

Mutual labels: pandas, data-wrangling

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-7.41%)

Mutual labels: spark, data-processing

Data-Analyst-Nanodegree

Kai Sheng Teh - Udacity Data Analyst Nanodegree

Stars: ✭ 42 (-22.22%)

Mutual labels: pandas, data-wrangling

alfred-workflow

No description or website provided.

Stars: ✭ 26 (-51.85%)

Mutual labels: workflow

blog

blog entries

Stars: ✭ 39 (-27.78%)

Mutual labels: spark

alfred-latex-symbols-workflow

🔎 Alfred 3-4 workflow to search for latex symbol commands

Stars: ✭ 33 (-38.89%)

Mutual labels: workflow

mimir

Data-ish exploration through SQL+Uncertainty

Stars: ✭ 26 (-51.85%)

Mutual labels: data-wrangling

my curd

超轻量快速开发脚手架、流程平台。

Stars: ✭ 38 (-29.63%)

Mutual labels: workflow

SparkV

🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.

Stars: ✭ 24 (-55.56%)

Mutual labels: spark

ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

Stars: ✭ 48 (-11.11%)

Mutual labels: workflow

zen-do-r

Um livro sobre programação para não-programadores.

Stars: ✭ 24 (-55.56%)

Mutual labels: workflow

baleen3

Baleen 3 is a data processing tool based on the Annot8 framework

Stars: ✭ 15 (-72.22%)

Mutual labels: data-processing

action-sync-node-meta

GitHub Action that syncs package.json with the repository metadata.

Stars: ✭ 25 (-53.7%)

Mutual labels: workflow

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-74.07%)

Mutual labels: spark

tukio

Tukio is an event based workflow generator library

Stars: ✭ 27 (-50%)

Mutual labels: workflow

Covid19Tracker

A Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.

Stars: ✭ 65 (+20.37%)

Mutual labels: spark

Tesseract

A set of libraries for rapidly developing Pipeline driven micro/macroservices.

Stars: ✭ 20 (-62.96%)

Mutual labels: workflow

1-60 of 1470 similar projects

›

next*5