incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+13561.11%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+116.67%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+1261.11%)
kafka-shell⚡A supercharged, interactive Kafka shell built on top of the existing Kafka CLI tools.
Stars: ✭ 107 (+494.44%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-22.22%)
kafkacliCLI and Go Clients to manage Kafka components (Kafka Connect & SchemaRegistry)
Stars: ✭ 28 (+55.56%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-11.11%)
php-kafka-libPHP Kafka producer / consumer library with PHP Avro support, based on php-rdkafka
Stars: ✭ 38 (+111.11%)
AvroConvertApache Avro serializer for .NET
Stars: ✭ 44 (+144.44%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-22.22%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+183.33%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+1255.56%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+427.78%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+1222.22%)
parquet-extraA collection of Apache Parquet add-on modules
Stars: ✭ 30 (+66.67%)
InsulatorA client UI to inspect Kafka topics, consume, produce and much more
Stars: ✭ 53 (+194.44%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (+83.33%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+1161.11%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+405.56%)
avro exAn Avro Library that emphasizes testability and ease of use.
Stars: ✭ 47 (+161.11%)
Spark WorkshopApache Spark™ and Scala Workshops
Stars: ✭ 224 (+1144.44%)
sparkar-voltsAn extensive non-reactive Typescript framework that eases the development experience in Spark AR
Stars: ✭ 15 (-16.67%)
Spark JobserverREST job server for Apache Spark
Stars: ✭ 2,748 (+15166.67%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (+16.67%)
Spark Fast TestsApache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Stars: ✭ 249 (+1283.33%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (+88.89%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+1266.67%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+905.56%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+14722.22%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-16.67%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+233.33%)
Azure Event Hubs☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (+1194.44%)
goavroGoavro translates between Go native types and binary and textual Avro data
Stars: ✭ 32 (+77.78%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (+144.44%)
jasvornoA library for strong, schema based conversion between 'natural' JSON documents and Avro
Stars: ✭ 18 (+0%)
Ruby SparkRuby wrapper for Apache Spark
Stars: ✭ 221 (+1127.78%)
Sagemaker SparkA Spark library for Amazon SageMaker.
Stars: ✭ 219 (+1116.67%)
Spark ExcelA Spark plugin for reading Excel files via Apache POI
Stars: ✭ 216 (+1100%)
avro-typescriptTypeScript Code Generator for Apache Avro Schema Types
Stars: ✭ 19 (+5.56%)
kafka-application4shttps://medium.com/xebia-france/getting-started-with-scala-and-apache-kafka-62bb1ca6a77f
Stars: ✭ 21 (+16.67%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+1100%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+1094.44%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (+1038.89%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+516.67%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+250%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+1038.89%)