Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+50%)

Mutual labels: pyspark

coursera-ai-for-medicine-specialization

Programming assignments, labs and quizzes from all courses in the Coursera AI for Medicine Specialization offered by deeplearning.ai

Stars: ✭ 80 (+207.69%)

Mutual labels: deeplearning

Js Spark

Realtime calculation distributed system. AKA distributed lodash

Stars: ✭ 187 (+619.23%)

Mutual labels: spark

Elephas

Distributed Deep learning with Keras & Spark

Stars: ✭ 1,521 (+5750%)

Mutual labels: spark

anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Stars: ✭ 77 (+196.15%)

Mutual labels: pyspark

AI-for-Security-Testing

My AI security testing projects

Stars: ✭ 34 (+30.77%)

Mutual labels: deeplearning

nyc-2019-scikit-sprint

NYC WiMLDS scikit-learn open source sprint (Aug 24, 2019)

Stars: ✭ 28 (+7.69%)

Mutual labels: datascience

Azuredatabricksbestpractices

Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs

Stars: ✭ 186 (+615.38%)

Mutual labels: spark

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (+323.08%)

Mutual labels: spark

flask-spark-docker

Just a boilerplate for PySpark and Flask

Stars: ✭ 32 (+23.08%)

Mutual labels: pyspark

focalloss

Focal Loss of multi-classification in tensorflow

Stars: ✭ 75 (+188.46%)

Mutual labels: deeplearning

Kotlin Spark Api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Stars: ✭ 183 (+603.85%)

Mutual labels: spark

primrose

Primrose modeling framework for simple production models

Stars: ✭ 33 (+26.92%)

Mutual labels: datascience

Seldon Server

Machine Learning Platform and Recommendation Engine built on Kubernetes

Stars: ✭ 1,435 (+5419.23%)

Mutual labels: spark

machine-learning-notebook-series

Jupyter notebook series for machine learning and deep learning.

Stars: ✭ 14 (-46.15%)

Mutual labels: deeplearning

Spark On K8s Operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Stars: ✭ 1,780 (+6746.15%)

Mutual labels: spark

xgboost-smote-detect-fraud

Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!

Stars: ✭ 59 (+126.92%)

Mutual labels: datascience

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (+303.85%)

Mutual labels: spark

dlsa

Distributed least squares approximation (dlsa) implemented with Apache Spark

Stars: ✭ 25 (-3.85%)

Mutual labels: pyspark

Spark Ffm

FFM (Field-Awared Factorization Machine) on Spark

Stars: ✭ 101 (+288.46%)

Mutual labels: spark

fedora-prime

Simple program to switch between intel and nvidia gpu

Stars: ✭ 24 (-7.69%)

Mutual labels: optimus

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+42173.08%)

Mutual labels: spark

DeepPixel

An open-source Python package for making computer vision and image processing simpler

Stars: ✭ 21 (-19.23%)

Mutual labels: deeplearning

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (+273.08%)

Mutual labels: spark

Free-Courses-on-Data-Science

No description or website provided.

Stars: ✭ 24 (-7.69%)

Mutual labels: datascience

hmac-timing-attacks

HMAC timing attack's w/ statistical analysis

Stars: ✭ 22 (-15.38%)

Mutual labels: datascience

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (+1723.08%)

Mutual labels: pyspark

SpeechEnhancement

Combining Weighted Multi-resolution STFT Loss and Distance Fusion to Optimize Speech Enhancement Generative Adversarial Networks

Stars: ✭ 49 (+88.46%)

Mutual labels: deeplearning

RFDA-PyTorch

Official Code for 'Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction' - ACM Multimedia2021 (ACMMM2021) Accepted Paper Task: Video Quality Enhancement / Video Compression Artifact Reduction

Stars: ✭ 44 (+69.23%)

Mutual labels: deeplearning

Roaringbitmap

A better compressed bitset in Java

Stars: ✭ 2,460 (+9361.54%)

Mutual labels: spark

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+253.85%)

Mutual labels: spark

2018-datascience-lectures

Lecture content for Intro to Data Science 2018

Stars: ✭ 32 (+23.08%)

Mutual labels: datascience

Big Data

🔧 Use dplyr to analyze Big Data 🐘

Stars: ✭ 93 (+257.69%)

Mutual labels: spark

awesome-open-mlops

The Fuzzy Labs guide to the universe of open source MLOps

Stars: ✭ 304 (+1069.23%)

Mutual labels: datascience

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (+242.31%)

Mutual labels: spark

kafka-twitter-spark-streaming

Counting Tweets Per User in Real-Time

Stars: ✭ 38 (+46.15%)

Mutual labels: pyspark

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (+596.15%)

Mutual labels: spark

Laravel Spark Google2fa

Google Authenticator support for Laravel Spark

Stars: ✭ 86 (+230.77%)

Mutual labels: spark

DataScienceTutorials.jl

A set of tutorials to show how to use Julia for data science (DataFrames, MLJ, ...)

Stars: ✭ 94 (+261.54%)

Mutual labels: datascience

Flint

Webex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)

Stars: ✭ 85 (+226.92%)

Mutual labels: spark

Groundbreaking-Papers

ML Research paper summaries, annotated papers and implementation walkthroughs

Stars: ✭ 90 (+246.15%)

Mutual labels: deeplearning

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (+219.23%)

Mutual labels: spark

wildebeest

File processing pipelines

Stars: ✭ 86 (+230.77%)

Mutual labels: datascience

Spark Dependencies

Spark job for dependency links

Stars: ✭ 82 (+215.38%)

Mutual labels: spark

datascience-environment

Docker Environment for data science

Stars: ✭ 18 (-30.77%)

Mutual labels: datascience

Spark Streaming With Kafka

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

Stars: ✭ 180 (+592.31%)

Mutual labels: spark

Articles-Bookmarked

No description or website provided.

Stars: ✭ 30 (+15.38%)

Mutual labels: deeplearning

Sparkstreaming

💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算)；🚀 支持运行过程中增删topic；🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。

Stars: ✭ 179 (+588.46%)

Mutual labels: spark

Xsql

Unified SQL Analytics Engine Based on SparkSQL

Stars: ✭ 176 (+576.92%)

Mutual labels: spark

Forecasting-Solar-Energy

Forecasting Solar Power: Analysis of using a LSTM Neural Network

Stars: ✭ 23 (-11.54%)

Mutual labels: deeplearning

fastapi-template

Completely Scalable FastAPI based template for Machine Learning, Deep Learning and any other software project which wants to use Fast API as an API framework.

Stars: ✭ 156 (+500%)

Mutual labels: deeplearning

Spark Kafka Writer

Write your Spark data to Kafka seamlessly

Stars: ✭ 175 (+573.08%)

Mutual labels: spark

301-360 of 984 similar projects