All Projects → Geni → Similar Projects or Alternatives

2024 Open source projects that are alternatives of or similar to Geni

Vizuka

Explore high-dimensional datasets and how your algo handles specific regions.

Stars: ✭ 100 (-34.21%)

Mutual labels: data-science, big-data

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (-29.61%)

Mutual labels: data-science, big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (+172.37%)

Mutual labels: data-science, big-data

Learn Something Every Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

Stars: ✭ 362 (+138.16%)

Mutual labels: data-science, data-engineering

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (+198.68%)

Mutual labels: data-science, big-data

Dataframe Go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Stars: ✭ 487 (+220.39%)

Mutual labels: dataframe, data-science

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+138.16%)

Mutual labels: spark, big-data

Spark Daria

Essential Spark extensions and helper methods ✨😲

Stars: ✭ 553 (+263.82%)

Mutual labels: dataframe, spark

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (+247.37%)

Mutual labels: big-data, distributed-computing

Taskflow

A General-purpose Parallel and Heterogeneous Task Programming System

Stars: ✭ 6,128 (+3931.58%)

Mutual labels: parallel-computing, high-performance-computing

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+233.55%)

Mutual labels: spark, big-data

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+3526.97%)

Mutual labels: spark, big-data

Datasheets

Read data from, write data to, and modify the formatting of Google Sheets

Stars: ✭ 593 (+290.13%)

Mutual labels: dataframe, data-science

Pandasvault

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

Stars: ✭ 316 (+107.89%)

Mutual labels: dataframe, data-science

Sciblog support

Support content for my blog

Stars: ✭ 694 (+356.58%)

Mutual labels: data-science, big-data

Mfem

Lightweight, general, scalable C++ library for finite element methods

Stars: ✭ 667 (+338.82%)

Mutual labels: parallel-computing, high-performance-computing

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+390.13%)

Mutual labels: spark, big-data

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 647 (+325.66%)

Mutual labels: dataframe, data-engineering

Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+421.71%)

Mutual labels: parallel-computing, high-performance-computing

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+421.71%)

Mutual labels: spark, data-engineering

Boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Stars: ✭ 23 (-84.87%)

Mutual labels: dataframe, data-science

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (+461.84%)

Mutual labels: data-science, big-data

Autodl

Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]

Stars: ✭ 854 (+461.84%)

Mutual labels: data-science, big-data

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+20701.32%)

Mutual labels: spark, big-data

Pretzel

Javascript full-stack framework for Big Data visualisation and analysis

Stars: ✭ 26 (-82.89%)

Mutual labels: data-science, big-data

Attaca

Robust, distributed version control for large files.

Stars: ✭ 41 (-73.03%)

Mutual labels: data-science, big-data

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+556.58%)

Mutual labels: data-science, spark

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-63.82%)

Mutual labels: data-science, spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+2467.76%)

Mutual labels: spark, big-data

Data Science Cookbook

🎓 Jupyter notebooks from UFC data science course

Stars: ✭ 60 (-60.53%)

Mutual labels: data-science, spark

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-61.18%)

Mutual labels: data-science, big-data

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-61.84%)

Mutual labels: data-science, spark

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (-26.32%)

Mutual labels: data-science, spark

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+673.03%)

Mutual labels: data-science, big-data

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-53.29%)

Mutual labels: spark, big-data

Cookbook

The Data Engineering Cookbook

Stars: ✭ 9,829 (+6366.45%)

Mutual labels: big-data, data-engineering

Pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

Stars: ✭ 57 (-62.5%)

Mutual labels: parallel-computing, distributed-computing

Drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

Stars: ✭ 1,301 (+755.92%)

Mutual labels: data-science, high-performance-computing

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (-46.05%)

Mutual labels: big-data, data-engineering

Opencoarrays

A parallel application binary interface for Fortran 2018 compilers.

Stars: ✭ 151 (-0.66%)

Mutual labels: parallel-computing, high-performance-computing

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (-25.66%)

Mutual labels: data-science, big-data

D6t Python

Accelerate data science

Stars: ✭ 118 (-22.37%)

Mutual labels: data-science, data-engineering

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-36.18%)

Mutual labels: spark, big-data

Parapet

A purely functional library to build distributed and event-driven systems

Stars: ✭ 106 (-30.26%)

Mutual labels: parallel-computing, distributed-computing

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-62.5%)

Mutual labels: spark, big-data

Elephas

Distributed Deep learning with Keras & Spark

Stars: ✭ 1,521 (+900.66%)

Mutual labels: spark, distributed-computing

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-27.63%)

Mutual labels: spark, big-data

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-9.87%)

Mutual labels: spark, big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-28.29%)

Mutual labels: data-science, big-data

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+1469.08%)

Mutual labels: data-science, data-engineering

Pyhpc Benchmarks

A suite of benchmarks to test the sequential CPU and GPU performance of most popular high-performance libraries for Python.

Stars: ✭ 119 (-21.71%)

Mutual labels: parallel-computing, high-performance-computing

Cape Python

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Stars: ✭ 125 (-17.76%)

Mutual labels: data-science, spark

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (-17.11%)

Mutual labels: data-science, data-engineering

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (-28.95%)

Mutual labels: data-science, spark

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (-15.79%)

Mutual labels: data-science, big-data

Batchtools

Tools for computation on batch systems

Stars: ✭ 127 (-16.45%)

Mutual labels: parallel-computing, high-performance-computing

Benchm Ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (+1107.24%)

Mutual labels: data-science, spark

Targets

Function-oriented Make-like declarative workflows for R

Stars: ✭ 293 (+92.76%)

Mutual labels: data-science, high-performance-computing

Python Seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

Stars: ✭ 302 (+98.68%)

Mutual labels: data-science, distributed-computing

Drake Examples

Example workflows for the drake R package

Stars: ✭ 57 (-62.5%)

Mutual labels: data-science, high-performance-computing

61-120 of 2024 similar projects

‹

›

next*5