VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-34.21%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-29.61%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+172.37%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+138.16%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+198.68%)
Dataframe GoDataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (+220.39%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+138.16%)
Spark DariaEssential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (+263.82%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (+247.37%)
TaskflowA General-purpose Parallel and Heterogeneous Task Programming System
Stars: ✭ 6,128 (+3931.58%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+233.55%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+3526.97%)
DatasheetsRead data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+290.13%)
PandasvaultAdvanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (+107.89%)
MfemLightweight, general, scalable C++ library for finite element methods
Stars: ✭ 667 (+338.82%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+390.13%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+325.66%)
ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+421.71%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+421.71%)
BoltzmanncleanFill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-84.87%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+461.84%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+461.84%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+20701.32%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-82.89%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-73.03%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+556.58%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-63.82%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+2467.76%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-61.18%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-61.84%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-26.32%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+6366.45%)
PwrakeParallel Workflow extension for Rake, runs on multicores, clusters, clouds.
Stars: ✭ 57 (-62.5%)
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+755.92%)
OpencoarraysA parallel application binary interface for Fortran 2018 compilers.
Stars: ✭ 151 (-0.66%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-25.66%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-22.37%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-36.18%)
ParapetA purely functional library to build distributed and event-driven systems
Stars: ✭ 106 (-30.26%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+900.66%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-27.63%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-28.29%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1469.08%)
Pyhpc BenchmarksA suite of benchmarks to test the sequential CPU and GPU performance of most popular high-performance libraries for Python.
Stars: ✭ 119 (-21.71%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-17.76%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-17.11%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-28.95%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-15.79%)
BatchtoolsTools for computation on batch systems
Stars: ✭ 127 (-16.45%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+1107.24%)
TargetsFunction-oriented Make-like declarative workflows for R
Stars: ✭ 293 (+92.76%)
Python SeminarPython for Data Science (Seminar Course at UC Berkeley; AY 250)
Stars: ✭ 302 (+98.68%)
Drake ExamplesExample workflows for the drake R package
Stars: ✭ 57 (-62.5%)