spider-123-eng / Spark

Licence: other

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Programming Languages

scala

5932 projects

Projects that are alternatives of or similar to Spark

Anotherkafkamonitor Akm

Another app which used to monitor the progress of Kafka Producer and Consumer

Stars: ✭ 36 (-34.55%)

Mutual labels: consumer, kafka-producer

Qbusbridge

The Apache Kafka Client SDK

Stars: ✭ 272 (+394.55%)

Mutual labels: consumer, kafka-producer

Librdkafka

The Apache Kafka C/C++ library

Stars: ✭ 5,617 (+10112.73%)

Mutual labels: consumer, kafka-producer

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+3029.09%)

Mutual labels: streaming, spark-sql

Flogo

Project Flogo is an open source ecosystem of opinionated event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.

Stars: ✭ 1,891 (+3338.18%)

Mutual labels: streaming, kafka-producer

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-14.55%)

Mutual labels: spark-sql, spark-dataframes

albis

Albis: High-Performance File Format for Big Data Systems

Stars: ✭ 20 (-63.64%)

Mutual labels: parquet, spark-sql

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-65.45%)

Mutual labels: parquet, spark-sql

sqs-quooler

A complete queue consumer for SQS

Stars: ✭ 23 (-58.18%)

Mutual labels: consumer

uvc-streamer

MJPEG webcam network streamer for linux

Stars: ✭ 25 (-54.55%)

Mutual labels: streaming

spark2-etl-examples

A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0

Stars: ✭ 23 (-58.18%)

Mutual labels: spark-sql

wow-spark

🔆 spark自学手册，包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake，以及scala基础练习，还有一些例如master、shuﬄe源码分析，总结及翻译。

Stars: ✭ 20 (-63.64%)

Mutual labels: spark-sql

matrixone

Hyperconverged cloud-edge native database

Stars: ✭ 1,057 (+1821.82%)

Mutual labels: streaming

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-65.45%)

Mutual labels: parquet

LazyMan-iOS

A simple app that lets you stream every live and archived NHL and MLB game from any of your iOS devices.

Stars: ✭ 73 (+32.73%)

Mutual labels: streaming

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-69.09%)

Mutual labels: parquet

odbc2parquet

A command line tool to query an ODBC data source and write the result into a parquet file.

Stars: ✭ 95 (+72.73%)

Mutual labels: parquet

libdvbtee

dvbtee: a digital television streamer / parser / service information aggregator supporting various interfaces including telnet CLI & http control

Stars: ✭ 65 (+18.18%)

Mutual labels: streaming

live-cryptocurrency-streaming-flutter

A Flutter app with live cryptocurrency updates, powered by Ably

Stars: ✭ 26 (-52.73%)

Mutual labels: streaming

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-56.36%)

Mutual labels: parquet

View All Similar Projects ➔

Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. With Spark running on Apache Hadoop YARN, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop.

This project contains programs for Spark in Scala launguage .

Topics Covered in Spark 2.1

Implementing custom UDF,UDAF,Partitioner using Spark-2.1
Working with DataFrames (ComplexSchema,DropDuplicates,DatasetConversion,GroupingAndAggregation)
Working with DataSets
Working with Parquet files
Partitioning the data by a specific column and store it partition wise
Loading Data from Cassnadra table using Spark
Working with Spark Catalog API to access Hive tables
Inserting data in to Hive table (Managed,External) from Spark
Inserting data in to Hive Partitioned table as Parquet format (Managed,External) from Spark
Adding,Listing Partitions to Hive table using Spark
CRUD operations on Cassandra Using Spark
Reading/Writing to S3 buckets Using Spark
Spark MangoDB Integration
Adding Hive Partitions by fetching data from cassandra.
Exporting/Backup Cassandra table data using spark.
Reading and Writing Data to Elastic Search Using Spark 2.x
Querying ElasticSearch Data From Spark 2.x
Deleting data from ElasticSearch from spark Dataframe  Pushing Spark Accumulator Values as metrics to DataDog API

Topics Covered in Spark 1.5

Spark Transformations.
Spark To Cassandra connection and storage.
Spark To Cassandra CRUD operations.
Reading data from Cassandra using spark streaming(Cassandra as source).
Spark Kafka Integration.
Spark Streaming with Kafka.
Storing the Spark Streaming data in to HDFS.
Storing the Spark Streaming data in to Cassandra.
Spark DataFrames API (Joining 2 data frames,sorting,wild card search,orderBy,Aggregations).
Spark SQL.
Spark Hive Context (Loading ORC,txt,parquet data from Hive table ).
Kafka Producer.
Kafka Consumer by Spark integration with Kafka.
Spark File Streaming.
Spark Socket Streaming.
Spark JDBC Connection.
Scala Case Class limitations overcoming by using Struct Type.
Working with CSV,Json,XML,ORC,Parquet data files in Spark.
Working with Avro,SequenceFiles in Spark.
Spark Joins.
Spark Window vs Sliding Interval.
Spark Aggregations using DataFrame API.
Writing a Custom UDF,UDAF in Spark.
Storing data as text,parquet file in to HDFS.
Integrating Spark with Mangodb.

Feel free to share any insights or constructive criticism. Cheers!!
#Happy Sparking!!!..

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

spider-123-eng / Spark

Programming Languages

Labels

Projects that are alternatives of or similar to Spark

Spark

Topics Covered in Spark 2.1

Topics Covered in Spark 1.5