All Projects → dfdx → Spark.jl

dfdx / Spark.jl

Licence: other
Julia binding for Apache Spark

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to Spark.jl

Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-53.59%)
Mutual labels:  spark, big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+774.51%)
Mutual labels:  spark, big-data
Labs
Research on distributed system
Stars: ✭ 73 (-52.29%)
Mutual labels:  spark, big-data
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-1.96%)
Mutual labels:  spark, big-data
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+1583.66%)
Mutual labels:  spark, big-data
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (+635.95%)
Mutual labels:  spark, big-data
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-48.37%)
Mutual labels:  spark, big-data
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+386.93%)
Mutual labels:  spark, big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-28.1%)
Mutual labels:  spark, big-data
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+7083.66%)
Mutual labels:  spark, big-data
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-62.75%)
Mutual labels:  spark, big-data
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (-10.46%)
Mutual labels:  spark, big-data
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+20565.36%)
Mutual labels:  spark, big-data
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-57.52%)
Mutual labels:  spark, big-data
Sparkjni
A heterogeneous Apache Spark framework.
Stars: ✭ 11 (-92.81%)
Mutual labels:  spark, big-data
Spark Website
Apache Spark Website
Stars: ✭ 75 (-50.98%)
Mutual labels:  spark, big-data
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+3503.27%)
Mutual labels:  spark, big-data
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+3596.73%)
Mutual labels:  spark, big-data
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-36.6%)
Mutual labels:  spark, big-data
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+973.2%)
Mutual labels:  spark, big-data

Spark.jl

A Julia interface to Apache Spark™

Latest Version Documentation PackageEvaluator Build Status
PkgEval

Spark.jl is a package that allows the execution of Julia programs on the Apache Spark platform. It supports running pure Julia scripts on Julia data structures, while utilising the data and code distribution capabalities of Apache Spark. It supports multiple cluster types (in client mode), and can be consider as an analogue to PySpark or RSpark within the Julia ecosystem.

Installation

Spark.jl requires at least Java 7 and Maven to be installed and available in PATH.

Pkg.add("Spark.jl")

This will download and build all Julia and Java dependencies. To use Spark.jl type:

using Spark
Spark.init()
sc = SparkContext(master="local")

Documentation

  • LATESTin-development version of the documentation.

Project Status

The package is tested against Julia 1.0, 1.4 and Java 8 and 11. It's also been tested on Amazon EMR and Azure HDInsight. While large cluster modes have been primarily tested on Linux, OS X and Windows do work for local development. See the roadmap for current status.

Contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems.

Trademarks

Apache®, Apache Spark and Spark are registered trademarks, or trademarks of the Apache Software Foundation in the United States and/or other countries.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].