GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+10.95%)
Mutual labels: data-science, big-data, data-engineering, high-performance-computing
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-27.01%)
Mutual labels: data-science, big-data, data-mining
TargetsFunction-oriented Make-like declarative workflows for R
Stars: ✭ 293 (+113.87%)
Mutual labels: data-science, reproducibility, high-performance-computing
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1002.92%)
Mutual labels: data-science, big-data, data-engineering
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-42.34%)
Mutual labels: data-science, big-data, data-engineering
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+523.36%)
Mutual labels: data-science, big-data, data-mining
Drake ExamplesExample workflows for the drake R package
Stars: ✭ 57 (-58.39%)
Mutual labels: data-science, reproducibility, high-performance-computing
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+849.64%)
Mutual labels: data-science, reproducibility, high-performance-computing
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-21.9%)
Mutual labels: data-science, big-data
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+31019.71%)
Mutual labels: data-science, data-engineering
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-17.52%)
Mutual labels: data-science, big-data
D6t PythonAccelerate data science
Stars: ✭ 118 (-13.87%)
Mutual labels: data-science, data-engineering
SteppyLightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (-13.14%)
Mutual labels: data-science, reproducibility
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-10.95%)
Mutual labels: data-science, data-engineering
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-27.74%)
Mutual labels: big-data, data-mining
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-20.44%)
Mutual labels: data-science, big-data
Papers Literature Ml Dl Rl AiHighly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Stars: ✭ 1,341 (+878.83%)
Mutual labels: data-science, data-mining
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+876.64%)
Mutual labels: data-science, big-data
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-8.03%)
Mutual labels: data-science, data-engineering
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1640.88%)
Mutual labels: data-science, data-engineering