All Projects → spider-123-eng → Spark

spider-123-eng / Spark

Licence: other
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Spark

Anotherkafkamonitor Akm
Another app which used to monitor the progress of Kafka Producer and Consumer
Stars: ✭ 36 (-34.55%)
Mutual labels:  consumer, kafka-producer
Qbusbridge
The Apache Kafka Client SDK
Stars: ✭ 272 (+394.55%)
Mutual labels:  consumer, kafka-producer
Librdkafka
The Apache Kafka C/C++ library
Stars: ✭ 5,617 (+10112.73%)
Mutual labels:  consumer, kafka-producer
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+3029.09%)
Mutual labels:  streaming, spark-sql
Flogo
Project Flogo is an open source ecosystem of opinionated event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.
Stars: ✭ 1,891 (+3338.18%)
Mutual labels:  streaming, kafka-producer
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-14.55%)
Mutual labels:  spark-sql, spark-dataframes
albis
Albis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (-63.64%)
Mutual labels:  parquet, spark-sql
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-65.45%)
Mutual labels:  parquet, spark-sql
sqs-quooler
A complete queue consumer for SQS
Stars: ✭ 23 (-58.18%)
Mutual labels:  consumer
uvc-streamer
MJPEG webcam network streamer for linux
Stars: ✭ 25 (-54.55%)
Mutual labels:  streaming
spark2-etl-examples
A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
Stars: ✭ 23 (-58.18%)
Mutual labels:  spark-sql
wow-spark
🔆 spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Stars: ✭ 20 (-63.64%)
Mutual labels:  spark-sql
matrixone
Hyperconverged cloud-edge native database
Stars: ✭ 1,057 (+1821.82%)
Mutual labels:  streaming
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-65.45%)
Mutual labels:  parquet
LazyMan-iOS
A simple app that lets you stream every live and archived NHL and MLB game from any of your iOS devices.
Stars: ✭ 73 (+32.73%)
Mutual labels:  streaming
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-69.09%)
Mutual labels:  parquet
odbc2parquet
A command line tool to query an ODBC data source and write the result into a parquet file.
Stars: ✭ 95 (+72.73%)
Mutual labels:  parquet
libdvbtee
dvbtee: a digital television streamer / parser / service information aggregator supporting various interfaces including telnet CLI & http control
Stars: ✭ 65 (+18.18%)
Mutual labels:  streaming
live-cryptocurrency-streaming-flutter
A Flutter app with live cryptocurrency updates, powered by Ably
Stars: ✭ 26 (-52.73%)
Mutual labels:  streaming
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-56.36%)
Mutual labels:  parquet

Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. With Spark running on Apache Hadoop YARN, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop.

This project contains programs for Spark in Scala launguage .

Topics Covered in Spark 2.1

Implementing custom UDF,UDAF,Partitioner using Spark-2.1
Working with DataFrames (ComplexSchema,DropDuplicates,DatasetConversion,GroupingAndAggregation)
Working with DataSets
Working with Parquet files
Partitioning the data by a specific column and store it partition wise
Loading Data from Cassnadra table using Spark
Working with Spark Catalog API to access Hive tables
Inserting data in to Hive table (Managed,External) from Spark
Inserting data in to Hive Partitioned table as Parquet format (Managed,External) from Spark
Adding,Listing Partitions to Hive table using Spark
CRUD operations on Cassandra Using Spark
Reading/Writing to S3 buckets Using Spark
Spark MangoDB Integration
Adding Hive Partitions by fetching data from cassandra.
Exporting/Backup Cassandra table data using spark.
Reading and Writing Data to Elastic Search Using Spark 2.x
Querying ElasticSearch Data From Spark 2.x
Deleting data from ElasticSearch from spark Dataframe
 Pushing Spark Accumulator Values as metrics to DataDog API

Topics Covered in Spark 1.5

Spark Transformations.
Spark To Cassandra connection and storage.
Spark To Cassandra CRUD operations.
Reading data from Cassandra using spark streaming(Cassandra as source).
Spark Kafka Integration.
Spark Streaming with Kafka.
Storing the Spark Streaming data in to HDFS.
Storing the Spark Streaming data in to Cassandra.
Spark DataFrames API (Joining 2 data frames,sorting,wild card search,orderBy,Aggregations).
Spark SQL.
Spark Hive Context (Loading ORC,txt,parquet data from Hive table ).
Kafka Producer.
Kafka Consumer by Spark integration with Kafka.
Spark File Streaming.
Spark Socket Streaming.
Spark JDBC Connection.
Scala Case Class limitations overcoming by using Struct Type.
Working with CSV,Json,XML,ORC,Parquet data files in Spark.
Working with Avro,SequenceFiles in Spark.
Spark Joins.
Spark Window vs Sliding Interval.
Spark Aggregations using DataFrame API.
Writing a Custom UDF,UDAF in Spark.
Storing data as text,parquet file in to HDFS.
Integrating Spark with Mangodb.


Feel free to share any insights or constructive criticism. Cheers!!
#Happy Sparking!!!..

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].