The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.

Stars: ✭ 47 (-97.41%)

Mutual labels: spark

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (-94.66%)

Mutual labels: spark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-97.52%)

Mutual labels: spark

Example Spark Kafka

Apache Spark and Apache Kafka integration example

Stars: ✭ 120 (-93.39%)

Mutual labels: spark

Spark Examples

Spark examples

Stars: ✭ 41 (-97.74%)

Mutual labels: spark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-26.32%)

Mutual labels: spark

Gatk

Official code repository for GATK versions 4 and up

Stars: ✭ 1,002 (-44.82%)

Mutual labels: spark

Spark Authorizer

A Spark SQL extension which provides SQL Standard Authorization for Apache Spark

Stars: ✭ 141 (-92.24%)

Mutual labels: spark

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (-44.93%)

Mutual labels: cassandra

Cqlkit

CLI tool to export Cassandra query as CSV and JSON format.

Stars: ✭ 94 (-94.82%)

Mutual labels: cassandra

Data Ingestion Platform

Stars: ✭ 39 (-97.85%)

Mutual labels: spark

Kinesis Sql

Kinesis Connector for Structured Streaming

Stars: ✭ 120 (-93.39%)

Mutual labels: spark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (-45.7%)

Mutual labels: spark

Spark Summit 2017 Sanfrancisco

spark summit 2017 SanFrancisco

Stars: ✭ 93 (-94.88%)

Mutual labels: spark

Opaque

An encrypted data analytics platform

Stars: ✭ 129 (-92.9%)

Mutual labels: spark

Haproxy Configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

Stars: ✭ 106 (-94.16%)

Mutual labels: cassandra

Thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Stars: ✭ 10,526 (+479.63%)

Mutual labels: spark

Spark On Kubernetes Helm

Spark on Kubernetes infrastructure Helm charts repo

Stars: ✭ 92 (-94.93%)

Mutual labels: spark

Spark Summit East 2017

Stars: ✭ 33 (-98.18%)

Mutual labels: spark

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (-10.24%)

Mutual labels: spark

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

Stars: ✭ 30 (-98.35%)

Mutual labels: spark

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-96.42%)

Mutual labels: spark

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (-98.4%)

Mutual labels: spark

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-92.35%)

Mutual labels: spark

Heracles

High performance HBase / Spark SQL engine

Stars: ✭ 27 (-98.51%)

Mutual labels: spark

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (-95.15%)

Mutual labels: spark

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+1641.08%)

Mutual labels: spark

Dcos Cassandra Service

DEPRECATED—Open source Apache Cassandra running on DC/OS is now replaced by mesosphere/dcos-commons/frameworks/cassandra. This repository will be deleted at the end of 2017.

Stars: ✭ 116 (-93.61%)

Mutual labels: cassandra

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-93.06%)

Mutual labels: spark

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (-94.05%)

Mutual labels: spark

Spark Bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.