All Projects → Geni → Similar Projects or Alternatives

2024 Open source projects that are alternatives of or similar to Geni

Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-34.21%)
Mutual labels:  data-science, big-data
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-29.61%)
Mutual labels:  data-science, big-data
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+172.37%)
Mutual labels:  data-science, big-data
Learn Something Every Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+138.16%)
Mutual labels:  data-science, data-engineering
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (+198.68%)
Mutual labels:  data-science, big-data
Dataframe Go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (+220.39%)
Mutual labels:  dataframe, data-science
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+138.16%)
Mutual labels:  spark, big-data
Spark Daria
Essential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (+263.82%)
Mutual labels:  dataframe, spark
Thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (+247.37%)
Mutual labels:  big-data, distributed-computing
Taskflow
A General-purpose Parallel and Heterogeneous Task Programming System
Stars: ✭ 6,128 (+3931.58%)
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (+233.55%)
Mutual labels:  spark, big-data
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+3526.97%)
Mutual labels:  spark, big-data
Datasheets
Read data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+290.13%)
Mutual labels:  dataframe, data-science
Pandasvault
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (+107.89%)
Mutual labels:  dataframe, data-science
Sciblog support
Support content for my blog
Stars: ✭ 694 (+356.58%)
Mutual labels:  data-science, big-data
Mfem
Lightweight, general, scalable C++ library for finite element methods
Stars: ✭ 667 (+338.82%)
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+390.13%)
Mutual labels:  spark, big-data
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+325.66%)
Mutual labels:  dataframe, data-engineering
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+421.71%)
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+421.71%)
Mutual labels:  spark, data-engineering
Boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-84.87%)
Mutual labels:  dataframe, data-science
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+461.84%)
Mutual labels:  data-science, big-data
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+461.84%)
Mutual labels:  data-science, big-data
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+20701.32%)
Mutual labels:  spark, big-data
Pretzel
Javascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-82.89%)
Mutual labels:  data-science, big-data
Attaca
Robust, distributed version control for large files.
Stars: ✭ 41 (-73.03%)
Mutual labels:  data-science, big-data
Pixiedust
Python Helper library for Jupyter Notebooks
Stars: ✭ 998 (+556.58%)
Mutual labels:  data-science, spark
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-63.82%)
Mutual labels:  data-science, spark
Delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+2467.76%)
Mutual labels:  spark, big-data
Data Science Cookbook
🎓 Jupyter notebooks from UFC data science course
Stars: ✭ 60 (-60.53%)
Mutual labels:  data-science, spark
Verticapy
VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-61.18%)
Mutual labels:  data-science, big-data
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-61.84%)
Mutual labels:  data-science, spark
Python Bigdata
Data science and Big Data with Python
Stars: ✭ 112 (-26.32%)
Mutual labels:  data-science, spark
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+673.03%)
Mutual labels:  data-science, big-data
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-53.29%)
Mutual labels:  spark, big-data
Cookbook
The Data Engineering Cookbook
Stars: ✭ 9,829 (+6366.45%)
Mutual labels:  big-data, data-engineering
Pwrake
Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
Stars: ✭ 57 (-62.5%)
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+755.92%)
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-46.05%)
Mutual labels:  big-data, data-engineering
Opencoarrays
A parallel application binary interface for Fortran 2018 compilers.
Stars: ✭ 151 (-0.66%)
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-25.66%)
Mutual labels:  data-science, big-data
D6t Python
Accelerate data science
Stars: ✭ 118 (-22.37%)
Mutual labels:  data-science, data-engineering
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-36.18%)
Mutual labels:  spark, big-data
Parapet
A purely functional library to build distributed and event-driven systems
Stars: ✭ 106 (-30.26%)
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-62.5%)
Mutual labels:  spark, big-data
Elephas
Distributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+900.66%)
Mutual labels:  spark, distributed-computing
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-27.63%)
Mutual labels:  spark, big-data
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (-9.87%)
Mutual labels:  spark, big-data
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-28.29%)
Mutual labels:  data-science, big-data
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1469.08%)
Mutual labels:  data-science, data-engineering
Pyhpc Benchmarks
A suite of benchmarks to test the sequential CPU and GPU performance of most popular high-performance libraries for Python.
Stars: ✭ 119 (-21.71%)
Cape Python
Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-17.76%)
Mutual labels:  data-science, spark
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (-17.11%)
Mutual labels:  data-science, data-engineering
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-28.95%)
Mutual labels:  data-science, spark
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-15.79%)
Mutual labels:  data-science, big-data
Batchtools
Tools for computation on batch systems
Stars: ✭ 127 (-16.45%)
Benchm Ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+1107.24%)
Mutual labels:  data-science, spark
Targets
Function-oriented Make-like declarative workflows for R
Stars: ✭ 293 (+92.76%)
Python Seminar
Python for Data Science (Seminar Course at UC Berkeley; AY 250)
Stars: ✭ 302 (+98.68%)
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (-62.5%)
61-120 of 2024 similar projects