Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (+1028.57%)

Mutual labels: pyspark

jhdf

A pure Java HDF5 library

Stars: ✭ 83 (+97.62%)

Mutual labels: bigdata

cassandra.realtime

Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink

Stars: ✭ 25 (-40.48%)

Mutual labels: spark-streaming

OSCI

Open Source Contributor Index

Stars: ✭ 107 (+154.76%)

Mutual labels: pyspark

ai-deployment

关注AI模型上线、模型部署

Stars: ✭ 149 (+254.76%)

Mutual labels: pyspark

163-bigdate-note

bigdata note

Stars: ✭ 38 (-9.52%)

Mutual labels: bigdata

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.

Stars: ✭ 29 (-30.95%)

Mutual labels: bigdata

interview-refresh-java-bigdata

a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

Stars: ✭ 25 (-40.48%)

Mutual labels: spark-streaming

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (+23.81%)

Mutual labels: pyspark

2019 egu workshop jupyter notebooks

Short course on interactive analysis of Big Earth Data with Jupyter Notebooks

Stars: ✭ 29 (-30.95%)

Mutual labels: bigdata

awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Stars: ✭ 11,093 (+26311.9%)

Mutual labels: bigdata

T-Watch

Real Time Twitter Sentiment Analysis Product

Stars: ✭ 20 (-52.38%)

Mutual labels: spark-streaming

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+173.81%)

Mutual labels: pyspark

architect big data solutions with spark

code, labs and lectures for the course

Stars: ✭ 40 (-4.76%)

Mutual labels: spark-streaming

lectures-hse-spark

Масштабируемое машинное обучение и анализ больших данных с Apache Spark

Stars: ✭ 20 (-52.38%)

Mutual labels: bigdata

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-59.52%)

Mutual labels: pyspark

TiBigData

TiDB connectors for Flink/Hive/Presto

Stars: ✭ 192 (+357.14%)

Mutual labels: bigdata

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (+173.81%)

Mutual labels: pyspark

vor

The new IoT Office Experience.

Stars: ✭ 44 (+4.76%)

Mutual labels: iot-sensors

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (+100%)

Mutual labels: pyspark

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-54.76%)

Mutual labels: spark-streaming

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (+38.1%)

Mutual labels: pyspark

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-50%)

Mutual labels: bigdata

PersonNotes

个人笔记集中营，快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧

Stars: ✭ 61 (+45.24%)

Mutual labels: bigdata

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-54.76%)

Mutual labels: pyspark

intersect

一道面试题的思考 - 6000万数据包和300万数据包在50M内存使用环境中求交集

Stars: ✭ 54 (+28.57%)

Mutual labels: bigdata

dlsa

Distributed least squares approximation (dlsa) implemented with Apache Spark

Stars: ✭ 25 (-40.48%)

Mutual labels: pyspark

young-examples

java学习和项目中一些典型的应用场景样例代码

Stars: ✭ 21 (-50%)

Mutual labels: bigdata

coolplayflink

Flink: Stateful Computations over Data Streams

Stars: ✭ 14 (-66.67%)

Mutual labels: bigdata

fdp-modelserver

An umbrella project for multiple implementations of model serving

Stars: ✭ 47 (+11.9%)

Mutual labels: spark-streaming

bqv

The simplest tool to manage views of BigQuery.

Stars: ✭ 22 (-47.62%)

Mutual labels: bigdata

jgit-spark-connector

jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.

Stars: ✭ 71 (+69.05%)

Mutual labels: pyspark

python mozetl

ETL jobs for Firefox Telemetry

Stars: ✭ 25 (-40.48%)

Mutual labels: pyspark

Spark-MLlib-Tutorial

大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件

Stars: ✭ 32 (-23.81%)

Mutual labels: bigdata

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-38.1%)

Mutual labels: pyspark

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (-11.9%)

Mutual labels: spark-streaming

spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Stars: ✭ 23 (-45.24%)

Mutual labels: pyspark

ExDeMon

A general purpose metrics monitor implemented with Apache Spark. Kafka source, Elastic sink, aggregate metrics, different analysis, notifications, actions, live configuration update, missing metrics, ...

Stars: ✭ 19 (-54.76%)

Mutual labels: spark-streaming

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-40.48%)

Mutual labels: pyspark

ceja

PySpark phonetic and string matching algorithms

Stars: ✭ 24 (-42.86%)

Mutual labels: pyspark

codefoundry

Examples for gauravbytes.com