All Projects → astrolabsoftware → spark3D

astrolabsoftware / spark3D

Licence: Apache-2.0 license
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Programming Languages

scala
5932 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to spark3D

pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+400%)
Mutual labels:  apache-spark, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+12504.35%)
Mutual labels:  apache-spark, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+382.61%)
Mutual labels:  apache-spark, pyspark
jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+239.13%)
Mutual labels:  apache-spark, pyspark
Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+4513.04%)
Mutual labels:  apache-spark, pyspark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+14486.96%)
Mutual labels:  apache-spark, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+117.39%)
Mutual labels:  apache-spark, pyspark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+139.13%)
Mutual labels:  apache-spark, pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-39.13%)
Mutual labels:  apache-spark, pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (+1282.61%)
Mutual labels:  apache-spark, pyspark
Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+843.48%)
Mutual labels:  apache-spark, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+552.17%)
Mutual labels:  apache-spark, pyspark
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+121.74%)
Mutual labels:  apache-spark, pyspark
Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (-4.35%)
Mutual labels:  apache-spark, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+69.57%)
Mutual labels:  apache-spark, pyspark
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (+30.43%)
Mutual labels:  apache-spark, pyspark
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-13.04%)
Mutual labels:  apache-spark, scientific-computing
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+265.22%)
Mutual labels:  apache-spark, pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+1239.13%)
Mutual labels:  apache-spark, pyspark
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+326.09%)
Mutual labels:  apache-spark, pyspark

Build Status codecov Maven Central

Latest News

  • [05/2018] GSoC 2018: spark3D has been selected to the Google Summer of Code (GSoC) 2018. Congratulation to @mayurdb who will work on the project this year!
  • [06/2018] Release: version 0.1.0, 0.1.1
  • [07/2018] New location: spark3D is an official project of AstroLab Software!
  • [07/2018] Release: version 0.1.3, 0.1.4, 0.1.5
  • [08/2018] Release: version 0.2.0, 0.2.1 (pyspark3d)
  • [09/2018] Release: version 0.2.2
  • [11/2018] Release: version 0.3.0, 0.3.1 (new DataFrame API)

Rationale

spark3D should be viewed as an extension of the Apache Spark framework, and more specifically the Spark SQL module, focusing on the manipulation of three*-dimensional data sets.

Why would you use spark3D? If you often need to repartition large spatial 3D data sets, or perform spatial queries (neighbour search, window queries, cross-match, clustering, ...), spark3D is for you. It contains optimised classes and methods to do so, and it spares you the implementation time! In addition, a big advantage of all those extensions is to efficiently perform visualisation of large data sets by quickly building a representation of your data set (see more here).

spark3D exposes two API: Scala (spark3D) and Python (pyspark3d). The core developments are done in Scala, and interfaced with Python using the great py4j package. This means pyspark3d might not contain all the features present in spark3D. In addition, due to difference between Scala and Python, there might be subtle differences in the two APIs.

While we try to stick to the latest Apache Spark developments, spark3D started with the RDD API and slowly migrated to use the DataFrame API. This process left a huge imprint on the code structure, and low-level layers in spark3D often still use RDD to manipulate the data. Do not be surprised if things are moving, the package is under an active development but we try to keep the user interface as stable as possible!

Last but not least: spark3D is by no means complete, and you are welcome to suggest changes, report bugs or inconsistent implementations, and contribute directly to the package!

Cheers, Julien

Why 3? Because there are already plenty of very good packages dealing with 2D data sets (e.g. geospark, geomesa, magellan, GeoTrellis, and others), but that was not suitable for many applications such as in astronomy!

Installation and tutorials

Scala

You can link spark3D to your project (either spark-shell or spark-submit) by specifying the coordinates:

spark-submit --packages "com.github.astrolabsoftware:spark3d_2.11:0.3.1"

Python

Just run

pip install pyspark3d

Note that we release the assembly JAR with it.

More information

See the website!

Contributors

  • Julien Peloton (peloton at lal.in2p3.fr)
  • Christian Arnault (arnault at lal.in2p3.fr)
  • Mayur Bhosale (mayurdb31 at gmail.com) -- GSoC 2018.

Contributing to spark3D: see CONTRIBUTING.

Support

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].