datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-31.58%)
Mutual labels: big-data, apache-spark, datalake, spark-sql
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-40.35%)
Mutual labels: big-data, bigdata, spark-sql
Movies-Analytics-in-Spark-and-ScalaData cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-17.54%)
Mutual labels: big-data, spark-sql, spark-scala
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-66.67%)
Mutual labels: big-data, apache-spark, bigdata
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+2919.3%)
Mutual labels: apache-spark, bigdata, spark-sql
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-77.19%)
Mutual labels: big-data, apache-spark, bigdata
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+277.19%)
Mutual labels: big-data, apache-spark, bigdata
spark-twitter-sentiment-analysisSentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-3.51%)
Mutual labels: apache-spark, spark-sql
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (-5.26%)
Mutual labels: big-data, bigdata
geosparkbring sf to spark in production
Stars: ✭ 53 (-7.02%)
Mutual labels: apache-spark, spark-sql
dt-sql-parserSQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+136.84%)
Mutual labels: bigdata, spark-sql
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+17.54%)
Mutual labels: big-data, apache-spark
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-64.91%)
Mutual labels: big-data, apache-spark
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+121.05%)
Mutual labels: big-data, bigdata
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-45.61%)
Mutual labels: big-data, apache-spark
twitter-archive-readerFull featured TypeScript Twitter archive reader and browser
Stars: ✭ 43 (-24.56%)
Mutual labels: big-data, bigdata
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+5.26%)
Mutual labels: big-data, bigdata
Real-time-Data-WarehouseReal-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Stars: ✭ 52 (-8.77%)
Mutual labels: datalake, spark-sql
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+5785.96%)
Mutual labels: big-data, apache-spark