Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+2812%)

Mutual labels: spark

spark-word2vec

A parallel implementation of word2vec based on Spark

Stars: ✭ 24 (-4%)

Mutual labels: spark

Hail

Scalable genomic data analysis.

Stars: ✭ 706 (+2724%)

Mutual labels: spark

Opaque

An encrypted data analytics platform

Stars: ✭ 129 (+416%)

Mutual labels: spark

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (+28%)

Mutual labels: pyspark

Spark Structured Streaming Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Stars: ✭ 168 (+572%)

Mutual labels: spark

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+184%)

Mutual labels: spark

Every Single Day I Tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Stars: ✭ 249 (+896%)

Mutual labels: spark

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+6784%)

Mutual labels: spark

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+22260%)

Mutual labels: spark

visualize-data-with-python

A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.

Stars: ✭ 60 (+140%)

Mutual labels: spark

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+2344%)

Mutual labels: spark

Airflow Pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Stars: ✭ 128 (+412%)

Mutual labels: spark

Mongo Spark

The MongoDB Spark Connector

Stars: ✭ 588 (+2252%)

Mutual labels: spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+888%)

Mutual labels: spark

Sparklearning

Learning Apache spark,including code and data .Most part can run local.

Stars: ✭ 558 (+2132%)

Mutual labels: spark

Openuba

A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]

Stars: ✭ 127 (+408%)

Mutual labels: spark

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+2052%)

Mutual labels: spark

frovedis

Framework of vectorized and distributed data analytics

Stars: ✭ 59 (+136%)

Mutual labels: spark

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+1952%)

Mutual labels: spark

Cape Python

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Stars: ✭ 125 (+400%)

Mutual labels: spark

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+1928%)

Mutual labels: spark

Neo4j Spark Connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

Stars: ✭ 245 (+880%)

Mutual labels: spark

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+47936%)

Mutual labels: spark

Spark Bigquery Connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Stars: ✭ 126 (+404%)

Mutual labels: spark

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+1724%)

Mutual labels: spark

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (+36%)

Mutual labels: pyspark

Usersessionbehaviorofflineanalysis

四川大学拓思爱诺用户session行为数据离线分析项目

Stars: ✭ 69 (+176%)

Mutual labels: spark

Fast Mrmr

An improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).

Stars: ✭ 67 (+168%)

Mutual labels: spark

Spark Infotheoretic Feature Selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Stars: ✭ 123 (+392%)

Mutual labels: spark

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-44%)

Mutual labels: spark

pyspark-asyncactions

Asynchronous actions for PySpark

Stars: ✭ 30 (+20%)

Mutual labels: pyspark

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (+168%)

Mutual labels: spark

docker-spark

Apache Spark docker container image (Standalone mode)

Stars: ✭ 34 (+36%)

Mutual labels: spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+280%)

Mutual labels: spark

301-360 of 456 similar projects

first

‹

›