fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-98.9%)
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (-65.47%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-97.58%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+10.02%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (-53.36%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (-11.34%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-88.11%)
Vagrant ProjectsVagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (-98.13%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-94.66%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-93.23%)
Cassandra exporterApache Cassandra® metrics exporter for Prometheus
Stars: ✭ 133 (-92.68%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+11.23%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-92.84%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-93.39%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-93.39%)
Spylon KernelJupyter kernel for scala and spark
Stars: ✭ 129 (-92.9%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+559.86%)
Cassandra Sharphigh performance .NET driver for Apache Cassandra
Stars: ✭ 114 (-93.72%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-92.18%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-92.35%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-9.58%)
Cassandra MigrateSimple Cassandra schema migration tool written in Python
Stars: ✭ 114 (-93.72%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-93.12%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-93.28%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-92.29%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-93.34%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-92.24%)
OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-92.9%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (-10.24%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-92.35%)
Dcos Cassandra ServiceDEPRECATED—Open source Apache Cassandra running on DC/OS is now replaced by mesosphere/dcos-commons/frameworks/cassandra. This repository will be deleted at the end of 2017.
Stars: ✭ 116 (-93.61%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (-5.23%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-93.72%)
Sstable ToolsTools for parsing, creating and doing other fun stuff with sstables
Stars: ✭ 145 (-92.02%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-93.72%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-92.95%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+0.28%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-93.78%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+41.85%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-93.83%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-93.89%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+0.17%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (-16.24%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-93.89%)
My MomentsInstagram Clone - Cloning Instagram for learning purpose
Stars: ✭ 140 (-92.29%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-93.01%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+2.2%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-93.01%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-93.94%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-94%)