WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-53.12%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+957.81%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-10.94%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-24.22%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-24.22%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+1478.13%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-10.94%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+945.31%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-5.47%)
Wifi基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-27.34%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-27.34%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-12.5%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-13.28%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+1088.28%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-31.25%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+0%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-2.34%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-13.28%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-31.25%)
WhirlFast iterative local development and testing of Apache Airflow workflows
Stars: ✭ 111 (-13.28%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-6.25%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-14.06%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-32.81%)
FlintWebex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Stars: ✭ 85 (-33.59%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-34.37%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-1.56%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-6.25%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-35.16%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+1157.81%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-14.06%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-36.72%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+862.5%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-14.84%)
LeharVisualize data using relative ordering
Stars: ✭ 81 (-36.72%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-36.72%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+1321.09%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-0.78%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-2.34%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-8.59%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (-16.41%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-38.28%)