Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (-1.98%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-94.22%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-93.28%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-92.29%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-93.34%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-96.2%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-94.66%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-26.32%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-92.24%)
CqlkitCLI tool to export Cassandra query as CSV and JSON format.
Stars: ✭ 94 (-94.82%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-93.39%)
OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-92.9%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-93.94%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (-10.24%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-92.35%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-95.15%)
Dcos Cassandra ServiceDEPRECATED—Open source Apache Cassandra running on DC/OS is now replaced by mesosphere/dcos-commons/frameworks/cassandra. This repository will be deleted at the end of 2017.
Stars: ✭ 116 (-93.61%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (-5.23%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-95.26%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-93.72%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-95.37%)
Sstable ToolsTools for parsing, creating and doing other fun stuff with sstables
Stars: ✭ 145 (-92.02%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-95.48%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-93.72%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (-32.16%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-92.95%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-95.54%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-93.78%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-95.7%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+0.28%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-93.89%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+420.48%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+0.17%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-95.93%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-93.89%)
My MomentsInstagram Clone - Cloning Instagram for learning purpose
Stars: ✭ 140 (-92.29%)
LabsResearch on distributed system
Stars: ✭ 73 (-95.98%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-96.04%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-93.01%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-94%)
Vos backendvangav open source - backend; a backend generator (generates more than 90% of the code needed for big scale backend services)
Stars: ✭ 71 (-96.09%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-92.51%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-93.12%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-96.31%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-96.31%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-93.06%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-94.05%)