bumblebee🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Stars: ✭ 120 (-91.12%)
foofahFoofah: programming-by-example data transformation program synthesizer
Stars: ✭ 24 (-98.22%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (-27.02%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (-68.99%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (-94.3%)
allie🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (-93.12%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+37.01%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+516.51%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-96%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-95.34%)
Data Forge TsThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (-28.42%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-0.96%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-97.48%)
Node HbaseAsynchronous HBase client for NodeJs using REST
Stars: ✭ 226 (-83.27%)
TipdmTipDM建模平台,开源的数据挖掘工具。
Stars: ✭ 130 (-90.38%)
Flink Boot懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Stars: ✭ 209 (-84.53%)
FpartSort files and pack them into partitions
Stars: ✭ 127 (-90.6%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-90.67%)
PoliAn easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Stars: ✭ 1,850 (+36.94%)
TdengineAn open-source big data platform designed and optimized for the Internet of Things (IoT).
Stars: ✭ 17,434 (+1190.45%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-89.64%)
TwitworkMonitor twitter stream
Stars: ✭ 133 (-90.16%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-84.09%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+27.39%)
popmonMonitor the stability of a Pandas or Spark dataframe ⚙︎
Stars: ✭ 434 (-67.88%)
VolcanoA Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (+56.48%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (-84.68%)
docker-kaggle-ko머신러닝/딥러닝(PyTorch, TensorFlow) 전용 도커입니다. 한글 폰트, 한글 자연어처리 패키지(konlpy), 형태소 분석기, Timezone 등의 설정 등을 추가 하였습니다.
Stars: ✭ 46 (-96.6%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-91.71%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+14.29%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-91.78%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (-85.42%)
Books技术书籍等
Stars: ✭ 110 (-91.86%)
Flinkstreamsql基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (+24.5%)
LDWizardA generic framework for simplifying the creation of linked data.
Stars: ✭ 17 (-98.74%)
workflUXAn open-source, cloud-ready web application for simplified deployment of big data workflows.
Stars: ✭ 26 (-98.08%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-81.57%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-86.45%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-91.93%)
Daudit🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Stars: ✭ 108 (-92.01%)
FlinkxBased on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+96.23%)
Awesome BigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+675.57%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-92.08%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (-81.87%)
GriddbGridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Stars: ✭ 1,587 (+17.47%)
Java Notes☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据
Stars: ✭ 160 (-88.16%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-92.23%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-92.23%)