All Projects → LearningJournal → SparkProgrammingInScala

LearningJournal / SparkProgrammingInScala

Licence: MIT license
Apache Spark Course Material

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to SparkProgrammingInScala

datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-31.58%)
Mutual labels:  big-data, apache-spark, datalake, spark-sql
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-40.35%)
Mutual labels:  big-data, bigdata, spark-sql
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-17.54%)
Mutual labels:  big-data, spark-sql, spark-scala
gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-66.67%)
Mutual labels:  big-data, apache-spark, bigdata
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+2919.3%)
Mutual labels:  apache-spark, bigdata, spark-sql
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-77.19%)
Mutual labels:  big-data, apache-spark, bigdata
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+277.19%)
Mutual labels:  big-data, apache-spark, bigdata
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-3.51%)
Mutual labels:  apache-spark, spark-sql
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (-5.26%)
Mutual labels:  big-data, bigdata
geospark
bring sf to spark in production
Stars: ✭ 53 (-7.02%)
Mutual labels:  apache-spark, spark-sql
Detecting-Malicious-URL-Machine-Learning
No description or website provided.
Stars: ✭ 47 (-17.54%)
Mutual labels:  big-data, apache-spark
dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+136.84%)
Mutual labels:  bigdata, spark-sql
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+17.54%)
Mutual labels:  big-data, apache-spark
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-64.91%)
Mutual labels:  big-data, apache-spark
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+121.05%)
Mutual labels:  big-data, bigdata
awesome-tools
curated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-45.61%)
Mutual labels:  big-data, apache-spark
twitter-archive-reader
Full featured TypeScript Twitter archive reader and browser
Stars: ✭ 43 (-24.56%)
Mutual labels:  big-data, bigdata
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+5.26%)
Mutual labels:  big-data, bigdata
Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Stars: ✭ 52 (-8.77%)
Mutual labels:  datalake, spark-sql
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+5785.96%)
Mutual labels:  big-data, apache-spark

Apache Spark 3 - Spark Programming in Scala for Beginners

This is the central repository for all the materials related to Apache Spark 3 - Spark Programming in Scala for Beginners
Course by Prashant Pandey.
You can get the full course at Apache Spark Course @ Udemy.

Apache Spark 3 - Spark Programming in Scala for Beginners

Description

I am creating Apache Spark 3 - Spark Programming in Scala for Beginners course to help you understand the Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session like approach. We will be taking a live coding approach and explain all the needed concepts along the way.

Who should take this Course?

I designed this course for software engineers willing to develop a Data Engineering pipeline and application using the Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with the people who implement Apache Spark at the ground level.

Kafka and source code version

This Course is using the Apache Spark 3.x. I have tested all the source code and examples used in this Course on Apache Spark 3.0.0 open-source distribution.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].