All Projects → optimus → Similar Projects or Alternatives

401 Open source projects that are alternatives of or similar to optimus

bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Stars: ✭ 120 (-91.12%)
foofah
Foofah: programming-by-example data transformation program synthesizer
Stars: ✭ 24 (-98.22%)
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (-27.02%)
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (-68.99%)
Mutual labels:  dask, data-exploration, data-profiling
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (-94.3%)
Mutual labels:  bigdata, pyspark
allie
🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (-93.12%)
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+37.01%)
Mutual labels:  data-exploration, data-profiling
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-97.48%)
Mutual labels:  data-transformation, pyspark
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+516.51%)
Mutual labels:  data-exploration, data-profiling
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-96%)
Mutual labels:  data-wrangling, data-preparation
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-95.34%)
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-97.71%)
Mutual labels:  data-wrangling, data-cleaning
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (-28.42%)
Mutual labels:  data-wrangling, data-cleaning
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-0.96%)
Mutual labels:  bigdata, pyspark
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-97.48%)
Mutual labels:  bigdata, pyspark
bamboolib binder template
bamboolib - template for creating your own binder notebook
Stars: ✭ 19 (-98.59%)
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (-96.3%)
Mutual labels:  bigdata, pyspark
Spark-and-Kafka IoT-Data-Processing-and-Analytics
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
Stars: ✭ 42 (-96.89%)
Mutual labels:  bigdata, pyspark
Big Data Study
🐳 big data study
Stars: ✭ 141 (-89.56%)
Mutual labels:  bigdata
Node Hbase
Asynchronous HBase client for NodeJs using REST
Stars: ✭ 226 (-83.27%)
Mutual labels:  bigdata
Ecommercerecommendsystem
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Stars: ✭ 139 (-89.71%)
Mutual labels:  bigdata
Tipdm
TipDM建模平台,开源的数据挖掘工具。
Stars: ✭ 130 (-90.38%)
Mutual labels:  bigdata
bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (-97.48%)
Mutual labels:  bigdata
Flink Boot
懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Stars: ✭ 209 (-84.53%)
Mutual labels:  bigdata
Fpart
Sort files and pack them into partitions
Stars: ✭ 127 (-90.6%)
Mutual labels:  bigdata
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-90.67%)
Mutual labels:  bigdata
Poli
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Stars: ✭ 1,850 (+36.94%)
Mutual labels:  bigdata
Tdengine
An open-source big data platform designed and optimized for the Internet of Things (IoT).
Stars: ✭ 17,434 (+1190.45%)
Mutual labels:  bigdata
Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-89.64%)
Mutual labels:  bigdata
spark-dgraph-connector
A connector for Apache Spark and PySpark to Dgraph databases.
Stars: ✭ 36 (-97.34%)
Mutual labels:  pyspark
Twitwork
Monitor twitter stream
Stars: ✭ 133 (-90.16%)
Mutual labels:  bigdata
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-84.09%)
Mutual labels:  bigdata
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+27.39%)
Mutual labels:  bigdata
popmon
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Stars: ✭ 434 (-67.88%)
Mutual labels:  data-profiling
Volcano
A Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (+56.48%)
Mutual labels:  bigdata
Shifu
An end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (-84.68%)
Mutual labels:  bigdata
docker-kaggle-ko
머신러닝/딥러닝(PyTorch, TensorFlow) 전용 도커입니다. 한글 폰트, 한글 자연어처리 패키지(konlpy), 형태소 분석기, Timezone 등의 설정 등을 추가 하였습니다.
Stars: ✭ 46 (-96.6%)
Mutual labels:  cudf
Liteflow
liteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-91.71%)
Mutual labels:  bigdata
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (-84.97%)
Mutual labels:  bigdata
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+14.29%)
Mutual labels:  bigdata
Lambda Arch
Applying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-91.78%)
Mutual labels:  bigdata
Awesome Learning
实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (-85.42%)
Mutual labels:  bigdata
Books
技术书籍等
Stars: ✭ 110 (-91.86%)
Mutual labels:  bigdata
Flinkstreamsql
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (+24.5%)
Mutual labels:  bigdata
LDWizard
A generic framework for simplifying the creation of linked data.
Stars: ✭ 17 (-98.74%)
Mutual labels:  data-transformation
workflUX
An open-source, cloud-ready web application for simplified deployment of big data workflows.
Stars: ✭ 26 (-98.08%)
Mutual labels:  bigdata
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-81.57%)
Mutual labels:  bigdata
Kotlin Spark Api
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-86.45%)
Mutual labels:  bigdata
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-91.93%)
Mutual labels:  bigdata
Daudit
🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Stars: ✭ 108 (-92.01%)
Mutual labels:  bigdata
Flinkx
Based on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+96.23%)
Mutual labels:  bigdata
Awesome Bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+675.57%)
Mutual labels:  bigdata
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-92.08%)
Mutual labels:  bigdata
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (-81.87%)
Mutual labels:  bigdata
Bigdata practice
大数据分析可视化实践
Stars: ✭ 166 (-87.71%)
Mutual labels:  bigdata
Griddb
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Stars: ✭ 1,587 (+17.47%)
Mutual labels:  bigdata
Flink Notes
flink学习笔记
Stars: ✭ 106 (-92.15%)
Mutual labels:  bigdata
Java Notes
☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据
Stars: ✭ 160 (-88.16%)
Mutual labels:  bigdata
Sparktutorial
Source code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-92.23%)
Mutual labels:  bigdata
Splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-92.23%)
Mutual labels:  bigdata
1-60 of 401 similar projects