All Projects → Scala Spark Tutorial → Similar Projects or Alternatives

471 Open source projects that are alternatives of or similar to Scala Spark Tutorial

Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (+5.79%)
Mutual labels:  big-data, apache-spark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+2672.73%)
Mutual labels:  big-data, apache-spark
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-73.55%)
Mutual labels:  big-data, apache-spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-8.26%)
Mutual labels:  big-data, apache-spark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-67.77%)
Mutual labels:  big-data, apache-spark
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (-52.89%)
Mutual labels:  big-data, apache-spark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+23.97%)
Mutual labels:  big-data, apache-spark
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+104.13%)
Mutual labels:  big-data, apache-spark
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+46.28%)
Mutual labels:  big-data, apache-spark
awesome-tools
curated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-74.38%)
Mutual labels:  big-data, apache-spark
Morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (+150.41%)
Mutual labels:  big-data, apache-spark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-58.68%)
Mutual labels:  big-data, apache-spark
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-89.26%)
Mutual labels:  big-data, apache-spark
Detecting-Malicious-URL-Machine-Learning
No description or website provided.
Stars: ✭ 47 (-61.16%)
Mutual labels:  big-data, apache-spark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-4.96%)
Mutual labels:  big-data, apache-spark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+2295.87%)
Mutual labels:  big-data, apache-spark
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (+13.22%)
Mutual labels:  big-data, apache-spark
gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-84.3%)
Mutual labels:  big-data, apache-spark
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+77.69%)
Mutual labels:  big-data, apache-spark
Hydrograph
A visual ETL development and debugging tool for big data
Stars: ✭ 144 (+19.01%)
Mutual labels:  big-data, apache-spark
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (+19.83%)
Mutual labels:  big-data, apache-spark
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-83.47%)
Mutual labels:  big-data, apache-spark
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-44.63%)
Mutual labels:  big-data, apache-spark
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (+128.1%)
Mutual labels:  big-data, apache-spark
Mist
Serverless proxy for Spark cluster
Stars: ✭ 309 (+155.37%)
Mutual labels:  big-data, apache-spark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-24.79%)
Mutual labels:  big-data
Hazelcast Python Client
Hazelcast IMDG Python Client
Stars: ✭ 92 (-23.97%)
Mutual labels:  big-data
Spark On K8s Operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+1371.07%)
Mutual labels:  apache-spark
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-6.61%)
Mutual labels:  big-data
Smart Array To Tree
Convert large amounts of data array to tree fastly
Stars: ✭ 91 (-24.79%)
Mutual labels:  big-data
Splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-13.22%)
Mutual labels:  apache-spark
Parquet Mr
Apache Parquet
Stars: ✭ 1,278 (+956.2%)
Mutual labels:  big-data
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-32.23%)
Mutual labels:  big-data
Docker Spark
Apache Spark docker image
Stars: ✭ 1,396 (+1053.72%)
Mutual labels:  apache-spark
Cuesheet
A framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-28.93%)
Mutual labels:  apache-spark
Spark States
Custom state store providers for Apache Spark
Stars: ✭ 83 (-31.4%)
Mutual labels:  apache-spark
Cmak
CMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8614.05%)
Mutual labels:  big-data
Ambari
Mirror of Apache Ambari
Stars: ✭ 1,576 (+1202.48%)
Mutual labels:  big-data
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-16.53%)
Mutual labels:  big-data
Panoptes
A Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-33.88%)
Mutual labels:  big-data
Uproot4
ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-33.88%)
Mutual labels:  big-data
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-17.36%)
Mutual labels:  big-data
Iotdb
Apache IoTDB
Stars: ✭ 1,221 (+909.09%)
Mutual labels:  big-data
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-34.71%)
Mutual labels:  big-data
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1176.03%)
Mutual labels:  big-data
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-18.18%)
Mutual labels:  big-data
Attic Predictionio Template Recommender
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 78 (-35.54%)
Mutual labels:  big-data
Spark Website
Apache Spark Website
Stars: ✭ 75 (-38.02%)
Mutual labels:  big-data
Samza Hello Samza
Mirror of Apache Samza
Stars: ✭ 99 (-18.18%)
Mutual labels:  big-data
Mlflow
Open source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+8906.61%)
Mutual labels:  apache-spark
Cookbook
The Data Engineering Cookbook
Stars: ✭ 9,829 (+8023.14%)
Mutual labels:  big-data
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-3.31%)
Mutual labels:  big-data
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-4.96%)
Mutual labels:  big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-9.09%)
Mutual labels:  big-data
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+8983.47%)
Mutual labels:  big-data
Labs
Research on distributed system
Stars: ✭ 73 (-39.67%)
Mutual labels:  big-data
Bookkeeper
Apache Bookkeeper
Stars: ✭ 1,178 (+873.55%)
Mutual labels:  big-data
Kudu
Mirror of Apache Kudu
Stars: ✭ 1,360 (+1023.97%)
Mutual labels:  big-data
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+871.07%)
Mutual labels:  big-data
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-41.32%)
Mutual labels:  big-data
1-60 of 471 similar projects