Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+461.96%)

Mutual labels: spark, hadoop

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-54.6%)

Mutual labels: spark, hadoop

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (-48.47%)

Mutual labels: spark, flink

Spark Bigquery Connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Stars: ✭ 126 (-22.7%)

Mutual labels: spark

Spark Authorizer

A Spark SQL extension which provides SQL Standard Authorization for Apache Spark

Stars: ✭ 141 (-13.5%)

Mutual labels: spark

Pulsar Flink

Elastic data processing with Apache Pulsar and Apache Flink

Stars: ✭ 126 (-22.7%)

Mutual labels: flink

Scala Samples

There are pieces of scala code that explain Scala syntax and related things - like what you can do with all this

Stars: ✭ 125 (-23.31%)

Mutual labels: spark

Hadoop Hdfs

Mirror of Apache Hadoop HDFS

Stars: ✭ 152 (-6.75%)

Mutual labels: hadoop

Rasterframes

Geospatial Raster support for Spark DataFrames

Stars: ✭ 142 (-12.88%)

Mutual labels: spark

Spark Infotheoretic Feature Selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Stars: ✭ 123 (-24.54%)

Mutual labels: spark

Dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Stars: ✭ 122 (-25.15%)

Mutual labels: hadoop

Data science blogs

A repository to keep track of all the code that I end up writing for my blog posts.

Stars: ✭ 139 (-14.72%)

Mutual labels: spark

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (-25.15%)

Mutual labels: spark

Deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Stars: ✭ 2,020 (+1139.26%)

Mutual labels: spark

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+7849.08%)

Mutual labels: hadoop

Powderkeg

Live-coding the cluster!

Stars: ✭ 152 (-6.75%)

Mutual labels: spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-14.11%)

Mutual labels: spark

Zparkio

Boiler plate framework to use Spark and ZIO together.

Stars: ✭ 121 (-25.77%)

Mutual labels: spark

Eat pyspark in 10 days

pyspark🍒🥭 is delicious，just eat it!😋😋

Stars: ✭ 116 (-28.83%)

Mutual labels: spark

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-14.11%)

Mutual labels: hadoop

Example Spark Kafka

Apache Spark and Apache Kafka integration example

Stars: ✭ 120 (-26.38%)

Mutual labels: spark

Streamline

StreamLine - Streaming Analytics

Stars: ✭ 151 (-7.36%)

Mutual labels: flink

Teddy

Spark Streaming监控平台，支持任务部署与告警、自启动

Stars: ✭ 120 (-26.38%)

Mutual labels: spark

Kinesis Sql

Kinesis Connector for Structured Streaming

Stars: ✭ 120 (-26.38%)

Mutual labels: spark

Elassandra

Elassandra = Elasticsearch + Apache Cassandra

Stars: ✭ 1,610 (+887.73%)

Mutual labels: spark

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-14.72%)

Mutual labels: spark

Flink Docker

Docker packaging for Apache Flink

Stars: ✭ 118 (-27.61%)

Mutual labels: flink

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (-28.22%)

Mutual labels: hadoop

Vue Info Card

Simple and beautiful card component with an elegant spark line, for VueJS.

Stars: ✭ 159 (-2.45%)

Mutual labels: spark

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (-6.75%)

Mutual labels: spark

Spark Tsne

Distributed t-SNE via Apache Spark

Stars: ✭ 151 (-7.36%)

Mutual labels: spark

Isolation Forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Stars: ✭ 139 (-14.72%)

Mutual labels: spark

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+893.25%)

Mutual labels: hadoop

Xlearning

AI on Hadoop

Stars: ✭ 1,709 (+948.47%)

Mutual labels: hadoop

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps

Stars: ✭ 11,983 (+7251.53%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+1049.08%)

Mutual labels: spark

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (-28.83%)

Mutual labels: hadoop

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-30.06%)

Mutual labels: hadoop

Spark Lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Stars: ✭ 114 (-30.06%)

Mutual labels: spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-15.95%)

Mutual labels: spark

Tensorflowonyarn

Support TensorFlow on YARN

Stars: ✭ 114 (-30.06%)

Mutual labels: hadoop

Spring Shiro Spark

Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试

Stars: ✭ 114 (-30.06%)

Mutual labels: spark

Hadoop Common

Mirror of Apache Hadoop common

Stars: ✭ 155 (-4.91%)

Mutual labels: hadoop

Benchm Ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (+1025.77%)

Mutual labels: spark

Apache Spark Node

Node.js bindings for Apache Spark DataFrame APIs

Stars: ✭ 136 (-16.56%)

Mutual labels: spark

Parquet Go

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

Stars: ✭ 114 (-30.06%)

Mutual labels: hadoop

Hbaseclient

HBase客户端数据管理软件

Stars: ✭ 135 (-17.18%)

Mutual labels: hadoop

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (-30.67%)

Mutual labels: spark

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (-31.29%)

Mutual labels: spark

61-120 of 651 similar projects

‹

›

next*5