All Projects → Geni → Similar Projects or Alternatives

2024 Open source projects that are alternatives of or similar to Geni

Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-1.32%)
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+1902.63%)
Mutual labels:  dataframe, data-science, spark, big-data
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-9.87%)
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-48.03%)
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+1594.74%)
Mutual labels:  spark, big-data, data-engineering
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+780.26%)
Mutual labels:  data-science, spark, big-data
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-26.97%)
Mutual labels:  big-data, spark, dataframe
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-52.63%)
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-57.24%)
Mutual labels:  data-science, spark, big-data
ParallelUtilities.jl
Fast and easy parallel mapreduce on HPC clusters
Stars: ✭ 28 (-81.58%)
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+894.08%)
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-19.74%)
Mutual labels:  data-science, spark, data-engineering
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+137.5%)
Mutual labels:  spark, big-data, distributed-computing
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+3621.05%)
Mutual labels:  data-science, spark, big-data
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+14405.26%)
Mutual labels:  data-science, spark, big-data
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+316.45%)
Mutual labels:  data-science, spark, data-engineering
Verticapy
VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-61.18%)
Mutual labels:  data-science, big-data
Data Science Cookbook
🎓 Jupyter notebooks from UFC data science course
Stars: ✭ 60 (-60.53%)
Mutual labels:  data-science, spark
Spark.jl
Julia binding for Apache Spark
Stars: ✭ 153 (+0.66%)
Mutual labels:  spark, big-data
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+0.66%)
Mutual labels:  data-science, big-data
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (+640.79%)
Mutual labels:  spark, big-data
Ensae teaching cs
Teaching materials in python at the @ENSAE
Stars: ✭ 69 (-54.61%)
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+673.03%)
Mutual labels:  data-science, big-data
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-48.03%)
Mutual labels:  data-science, data-engineering
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-46.05%)
Mutual labels:  big-data, data-engineering
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+11626.32%)
Mutual labels:  data-science, data-engineering
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-36.18%)
Mutual labels:  spark, big-data
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-61.84%)
Mutual labels:  data-science, spark
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-60.53%)
Mutual labels:  spark, data-engineering
Pwrake
Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
Stars: ✭ 57 (-62.5%)
Labs
Research on distributed system
Stars: ✭ 73 (-51.97%)
Mutual labels:  spark, big-data
Danfojs
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (+757.89%)
Mutual labels:  dataframe, data-science
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+7130.92%)
Mutual labels:  spark, big-data
Parapet
A purely functional library to build distributed and event-driven systems
Stars: ✭ 106 (-30.26%)
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-57.89%)
Mutual labels:  data-science, spark
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-53.29%)
Mutual labels:  spark, big-data
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-62.5%)
Mutual labels:  spark, big-data
Spark Website
Apache Spark Website
Stars: ✭ 75 (-50.66%)
Mutual labels:  spark, big-data
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+755.92%)
Cookbook
The Data Engineering Cookbook
Stars: ✭ 9,829 (+6366.45%)
Mutual labels:  big-data, data-engineering
Boinc
Open-source software for volunteer computing and grid computing.
Stars: ✭ 1,320 (+768.42%)
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-34.21%)
Mutual labels:  data-science, big-data
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (-62.5%)
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-27.63%)
Mutual labels:  spark, big-data
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-28.29%)
Mutual labels:  data-science, big-data
Elephas
Distributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+900.66%)
Mutual labels:  spark, distributed-computing
Datacompy
Pandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-3.29%)
Mutual labels:  data-science, spark
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-28.95%)
Mutual labels:  data-science, spark
Python Bigdata
Data science and Big Data with Python
Stars: ✭ 112 (-26.32%)
Mutual labels:  data-science, spark
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-25.66%)
Mutual labels:  data-science, big-data
Pyhpc Benchmarks
A suite of benchmarks to test the sequential CPU and GPU performance of most popular high-performance libraries for Python.
Stars: ✭ 119 (-21.71%)
D6t Python
Accelerate data science
Stars: ✭ 118 (-22.37%)
Mutual labels:  data-science, data-engineering
Opencoarrays
A parallel application binary interface for Fortran 2018 compilers.
Stars: ✭ 151 (-0.66%)
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+27948.68%)
Mutual labels:  data-science, data-engineering
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1469.08%)
Mutual labels:  data-science, data-engineering
Cape Python
Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-17.76%)
Mutual labels:  data-science, spark
Batchtools
Tools for computation on batch systems
Stars: ✭ 127 (-16.45%)
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-15.79%)
Mutual labels:  data-science, big-data
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-16.45%)
Mutual labels:  data-science, data-engineering
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+980.26%)
Mutual labels:  spark, big-data
1-60 of 2024 similar projects