sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-44%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+10572%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+188%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (+152%)
sparkar-voltsAn extensive non-reactive Typescript framework that eases the development experience in Spark AR
Stars: ✭ 15 (-40%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+132%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+264%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+12%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-16%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (+32%)
dlsaDistributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (+0%)
workshop-sparkCódigo para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (+8%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+624%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+12076%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-4%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+896%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+140%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+888%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (+136%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+880%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+36%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+876%)
shamashAutoscaling for Google Cloud Dataproc
Stars: ✭ 31 (+24%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+852%)
Azure Event Hubs☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (+832%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (+80%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (+444%)
Spark-PMoFSpark Shuffle Optimization with RDMA+AEP
Stars: ✭ 28 (+12%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-36%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+808%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+88%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (+20%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+1796%)
Ruby SparkRuby wrapper for Apache Spark
Stars: ✭ 221 (+784%)
Spark ExcelA Spark plugin for reading Excel files via Apache POI
Stars: ✭ 216 (+764%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+56%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-32%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+760%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+0%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (+720%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+720%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (+16%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+8996%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-40%)