All Projects → h2oai → Sparkling Water

h2oai / Sparkling Water

Licence: apache-2.0
Sparkling Water provides H2O functionality inside Spark cluster

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Sparkling Water

Benchm Ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+106.88%)
Mutual labels:  spark, h2o
Smudge
Control the Spotify app from within Emacs.
Stars: ✭ 186 (-79.03%)
Mutual labels:  api, integration
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-92.67%)
Mutual labels:  spark, h2o
Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (-74.41%)
Mutual labels:  api, spark
Datafire
A framework for building integrations and APIs
Stars: ✭ 487 (-45.1%)
Mutual labels:  api, integration
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+537.66%)
Mutual labels:  spark, h2o
Adrestia
APIs & SDK for interacting with Cardano.
Stars: ✭ 56 (-93.69%)
Mutual labels:  api, integration
Product Ei
An open source, a high-performance hybrid integration platform that allows developers quick integration with any application, data, or system.
Stars: ✭ 277 (-68.77%)
Mutual labels:  api, integration
Askql
AskQL is a query language that can express any data request
Stars: ✭ 352 (-60.32%)
Mutual labels:  api, integration
Cdap
An open source framework for building data analytic applications.
Stars: ✭ 509 (-42.62%)
Mutual labels:  spark, integration
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (-17.93%)
Mutual labels:  spark, integration
Reactjs Tmdb App
Responsive React 'The Movie Database' (TMDb) App
Stars: ✭ 830 (-6.43%)
Mutual labels:  api
Api
📖「一个」、「Time 时光」、「开眼」、「一席」、「梨视频」、「微软必应词典」、「金山词典」、「豆瓣电影」、「中央天气」、「魅族天气」、「每日一文」、「12306」、「途牛」、「快递100」、「快递」应用 Api。仅供学习,禁止商业使用,侵权请联系删除。
Stars: ✭ 6,544 (+637.77%)
Mutual labels:  api
Themoviedb
A node.js module with support for both callbacks and promises to provide access to the TMDb API
Stars: ✭ 5 (-99.44%)
Mutual labels:  api
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (-7.89%)
Mutual labels:  spark
Apis Made In Iran
A list of APIs from Iran
Stars: ✭ 835 (-5.86%)
Mutual labels:  api
Github Create Token
Create a Github OAuth access token.
Stars: ✭ 6 (-99.32%)
Mutual labels:  api
Goyave
🍐 Elegant Golang REST API Framework
Stars: ✭ 811 (-8.57%)
Mutual labels:  api
Notion Api Worker
Notion as CMS with easy API access
Stars: ✭ 806 (-9.13%)
Mutual labels:  api
Xxl Api
A api management platform.(API管理平台XXL-API)
Stars: ✭ 803 (-9.47%)
Mutual labels:  api

Sparkling Water

|Join the chat at https://gitter.im/h2oai/sparkling-water| |image2| |image3| |Powered by H2O.ai|

Sparkling Water integrates |H2O|'s fast scalable machine learning engine with Spark. It provides:

  • Utilities to publish Spark data structures (RDDs, DataFrames, Datasets) as H2O's frames and vice versa.
  • DSL to use Spark data structures as input for H2O's algorithms.
  • Basic building blocks to create ML applications utilizing Spark and H2O APIs.
  • Python interface enabling use of Sparkling Water directly from PySpark.

Getting Started

User Documentation

The documentation contains also documentation for our clients, PySparkling and RSparkling.

- `Sparkling Water For Spark 3.0 <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/index.html>`__
- `Sparkling Water For Spark 2.4 <http://docs.h2o.ai/sparkling-water/2.4/latest-stable/doc/index.html>`__
- `Sparkling Water For Spark 2.3 <http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/index.html>`__
- `Sparkling Water For Spark 2.2 <http://docs.h2o.ai/sparkling-water/2.2/latest-stable/doc/index.html>`__
- `Sparkling Water For Spark 2.1 <http://docs.h2o.ai/sparkling-water/2.1/latest-stable/doc/index.html>`__

Download Binaries
~~~~~~~~~~~~~~~~~

- `Latest version for Spark 3.0 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.0/latest.html>`__
- `Latest version for Spark 2.4 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/latest.html>`__
- `Latest version for Spark 2.3 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.3/latest.html>`__
- `Latest version for Spark 2.2 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.2/latest.html>`__
- `Latest version for Spark 2.1 <http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.1/latest.html>`__


Maven
~~~~~

Each Sparkling Water release is published into Maven central with following coordinates:

- ``ai.h2o:sparkling-water-core_{{scala_version}}:{{version}}`` - Includes core of Sparkling Water
- ``ai.h2o:sparkling-water-examples_{{scala_version}}:{{version}}`` - Includes example applications
- ``ai.h2o:sparkling-water-repl_{{scala_version}}:{{version}}`` - Spark REPL integration into H2O Flow UI
- ``ai.h2o:sparkling-water-ml_{{scala_version}}:{{version}}`` - Extends Spark ML package by H2O-based transformations
- ``ai.h2o:sparkling-water-scoring_{{scala_version}}:{{version}}`` - A library containing scoring logic and definition of Sparkling Water MOJO models.
- ``ai.h2o:sparkling-water-scoring-package_{{scala_version}}:{{version}}`` - Lightweight Sparkling Water package including all dependencies required just for scoring with H2O-3 and DAI MOJO models.
- ``ai.h2o:sparkling-water-package_{{scala_version}}:{{version}}`` - Sparkling Water package containing all dependencies required for model training and scoring. This is designed to use as Spark package via ``--packages`` option.

   **Note:** The ``{{version}}`` references to a release version of Sparkling Water, the ``{{scala_version}}``
   references to Scala base version.

The full list of published packages is available
`here <http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22ai.h2o%22%20AND%20a%3Asparkling-water*>`__.

Sparkling Water Requirements for Spark 3.0
  • Linux/OS X/Windows
  • Java 8+
  • Python 2.7+ For Python version of Sparkling Water (PySparkling)
  • Spark 3.0 <https://spark.apache.org/downloads.html>__ and SPARK_HOME shell variable must point to your local Spark installation

To see requirements for older Spark version, please visit relevant documentation.


Use Sparkling Water

Sparkling Water is distributed as a Spark application library which can be used by any Spark application. Furthermore, we provide also zip distribution which bundles the library and shell scripts.

There are several ways of using Sparkling Water:

  • Sparkling Shell
  • Sparkling Water driver
  • Spark Shell and include Sparkling Water library via --jars or --packages option
  • Spark Submit and include Sparkling Water library via --jars or --packages option
  • PySpark with PySparkling

Run Sparkling shell


The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via ``--jars`` option.
The Sparkling Shell supports creation of an |H2O| cloud and execution of |H2O| algorithms.

1. Either download or build Sparkling Water
2. Configure the location of Spark cluster:

   .. code:: bash

      export SPARK_HOME="/path/to/spark/installation"
      export MASTER="local[*]"


   In this case, ``local[*]`` points to an embedded single node cluster.

3. Run Sparkling Shell:

   .. code:: bash

      bin/sparkling-shell

   Sparkling Shell accepts common Spark Shell arguments. For example, to increase memory allocated by each executor, use the ``spark.executor.memory`` parameter: ``bin/sparkling-shell --conf "spark.executor.memory=4g"``

4. Initialize H2OContext

   .. code:: scala

      import ai.h2o.sparkling._
      val hc = H2OContext.getOrCreate()

   ``H2OContext`` starts H2O services on top of Spark cluster and provides primitives for transformations between |H2O| and Spark data structures.


Use Sparkling Water with PySpark

Sparkling Water can be also used directly from PySpark and the integration is called PySparkling.

See PySparkling README <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/pysparkling.html>__ to learn about PySparkling.

Use Sparkling Water via Spark Packages


To see how Sparkling Water can be used as Spark package, please see `Use as Spark Package <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/tutorials/use_as_spark_package.html>`__.

Use Sparkling Water in Windows environments

See Windows Tutorial <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/tutorials/run_on_windows.html>__ to learn how to use Sparkling Water in Windows environments.

Sparkling Water examples

To see how to run examples for Sparkling Water, please see `Running Examples <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/devel/running_examples.html>`__.

--------------

Sparkling Water Backends
------------------------

Sparkling water supports two backend/deployment modes - internal and
external. Sparkling Water applications are independent on the selected
backend. The backend can be specified before creation of the
``H2OContext``.

For more details regarding the internal or external backend, please see
`Backends <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/deployment/backends.html>`__.

--------------

FAQ
---

List of all Frequently Asked Questions is available at `FAQ <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/FAQ.html>`__.

--------------

Development
-----------

Complete development documentation is available at `Development Documentation <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/devel/devel.html>`__.

Build Sparkling Water
~~~~~~~~~~~~~~~~~~~~~

To see how to build Sparkling Water, please see `Build Sparkling Water <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/devel/build.html>`__.

Develop applications with Sparkling Water

An application using Sparkling Water is regular Spark application which bundling Sparkling Water library. See Sparkling Water Droplet providing an example application here <https://github.com/h2oai/h2o-droplets/tree/master/sparkling-water-droplet>__.

Contributing


Look at our `list of JIRA
tasks <https://0xdata.atlassian.net/projects/SW/issues>`__ or send your idea to [email protected]

Filing Bug Reports and Feature Requests

You can file a bug report of feature request directly in the Sparkling Water JIRA page at http://jira.h2o.ai/ <https://0xdata.atlassian.net/projects/SW/issues>__.

  1. Log in to the Sparkling Water JIRA tracking system. (Create an account if necessary.)

  2. Once inside the home page, click the Create button.

    .. figure:: /doc/src/site/sphinx/images/jira_create.png :alt: center

  3. A form will display allowing you to enter information about the bug or feature request.

    .. figure:: /doc/src/site/sphinx/images/jira_new_issue.png :alt: center

    Enter the following on the form:

    • Select the Project that you want to file the issue under. For example, if this is an open source public bug, you should file it under SW (SW).

    • Specify the Issue Type. For example, if you believe you've found a bug, then select Bug, or if you want to request a new feature, then select New Feature.

    • Provide a short but concise summary about the issue. The summary will be shown when engineers organize, filter, and search for Jira tickets.

    • Specify the urgency of the issue using the Priority dropdown menu.

    • If there is a due date specify it with the Due Date.

    • The Components drop down refers to the API or language that the issue relates to. (See the drop down menu for available options.)

    • You can leave Affects Version/s, Fix Version\s, and Assignee fields blank. Our engineering team will fill this in.

    • Add a detailed description of your bug in the Description section. Best practice for descriptions include:

    • A summary of what the issue is

    • What you think is causing the issue

    • Reproducible code that can be run end to end without requiring an engineer to edit your code. Use {code} {code} around your code to make it appear in code format.

    • Any scripts or necessary documents. Add by dragging and dropping your files into the create issue dialogue box.

    You can be able to leave the rest of the ticket blank.

  4. When you are done with your ticket, simply click on the Create button at the bottom of the page.

    .. figure:: /doc/src/site/sphinx/images/jira_finished_create.png :alt: center

After you click Create, a pop up will appear on the right side of your screen with a link to your Jira ticket. It will have the form https://0xdata.atlassian.net/browse/SW-####. You can use this link to later edit your ticket.

Please note that your Jira ticket number along with its summary will appear in one of the Jira ticket slack channels, and anytime you update the ticket anyone associated with that ticket, whether as the assignee or a watcher will receive an email with your changes.

Have Questions?


We also respond to questions tagged with sparkling-water and h2o tags on the `Stack Overflow <https://stackoverflow.com/questions/tagged/sparkling-water>`__.

Change Logs
~~~~~~~~~~~

Change logs are available at `Change Logs <http://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/CHANGELOG.html>`__.

---------------

.. |Join the chat at https://gitter.im/h2oai/sparkling-water| image:: https://badges.gitter.im/Join%20Chat.svg
   :target: https://gitter.im/h2oai/sparkling-water?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
.. |image2| image:: https://maven-badges.herokuapp.com/maven-central/ai.h2o/sparkling-water-core_2.12/badge.svg
   :target: http://search.maven.org/#search%7Cgav%7C1%7Cg:%22ai.h2o%22%20AND%20a:%22sparkling-water-core_2.12%22
.. |image3| image:: https://img.shields.io/badge/License-Apache%202-blue.svg
   :target: LICENSE
.. |Powered by H2O.ai| image:: https://img.shields.io/badge/powered%20by-h2oai-yellow.svg
   :target: https://github.com/h2oai/
.. |H2O| replace:: H\ :sub:`2`\ O

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].