TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+507.58%)

Mutual labels: spark

Gatk

Official code repository for GATK versions 4 and up

Stars: ✭ 1,002 (+192.13%)

Mutual labels: spark

Search Ads Web Service

Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

Stars: ✭ 30 (-91.25%)

Mutual labels: spark

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+190.96%)

Mutual labels: spark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (-51.6%)

Mutual labels: spark

Snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

Stars: ✭ 995 (+190.09%)

Mutual labels: spark

kafka-spark-streaming-zeppelin-docker

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

Stars: ✭ 82 (-76.09%)

Mutual labels: spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-89.21%)

Mutual labels: spark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (-51.9%)

Mutual labels: spark

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (-89.21%)

Mutual labels: spark

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (-88.63%)

Mutual labels: spark

Spark Summit East 2017

Stars: ✭ 33 (-90.38%)

Mutual labels: spark

Whylogs Java

Profile and monitor your ML data pipeline end-to-end

Stars: ✭ 164 (-52.19%)

Mutual labels: spark

Spark Hbase Connector

Connect Spark to HBase for reading and writing data with ease

Stars: ✭ 299 (-12.83%)

Mutual labels: spark

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+176.68%)

Mutual labels: spark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+577.26%)

Mutual labels: spark

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+9118.08%)

Mutual labels: spark

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (-92.13%)

Mutual labels: spark

Flint

A Time Series Library for Apache Spark

Stars: ✭ 878 (+155.98%)

Mutual labels: spark

Glow

An open-source toolkit for large-scale genomic analysis

Stars: ✭ 159 (-53.64%)

Mutual labels: spark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-95.92%)

Mutual labels: spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-92.71%)

Mutual labels: spark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-96.5%)

Mutual labels: spark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (-53.94%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-96.79%)

Mutual labels: spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-72.3%)

Mutual labels: spark

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+149.85%)

Mutual labels: spark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (-54.81%)

Mutual labels: spark

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-92.42%)

Mutual labels: spark

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (-9.62%)

Mutual labels: spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+170.85%)

Mutual labels: spark

Quill

Compile-time Language Integrated Queries for Scala

Stars: ✭ 1,998 (+482.51%)

Mutual labels: spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-93.29%)

Mutual labels: spark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-92.42%)

Mutual labels: spark

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+167.06%)

Mutual labels: spark

Powderkeg

Live-coding the cluster!

Stars: ✭ 152 (-55.69%)

Mutual labels: spark

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-90.67%)

Mutual labels: spark

Almond

A Scala kernel for Jupyter

Stars: ✭ 1,354 (+294.75%)

Mutual labels: spark

Recommendationsystem

Book recommender system using collaborative filtering based on Spark

Stars: ✭ 244 (-28.86%)

Mutual labels: spark

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-71.72%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+446.06%)

Mutual labels: spark

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (+0.58%)

Mutual labels: spark

Iql

An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)