All Projects → intel-analytics → Analytics Zoo

intel-analytics / Analytics Zoo

Licence: apache-2.0
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

Programming Languages

python
139335 projects - #7 most used programming language
scala
5932 projects
Jupyter Notebook
11667 projects
shell
77523 projects
java
68154 projects - #9 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to Analytics Zoo

Bigdl
Building Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+55.76%)
Mutual labels:  bigdl, distributed-deep-learning, analytics-zoo
Unet Segmentation In Keras Tensorflow
UNet is a fully convolutional network(FCN) that does image segmentation. Its goal is to predict each pixel's class. It is built upon the FCN and modified in a way that it yields better segmentation in medical imaging.
Stars: ✭ 105 (-95.71%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Unet
Generic U-Net Tensorflow 2 implementation for semantic segmentation
Stars: ✭ 100 (-95.92%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Inferpy
InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy
Stars: ✭ 130 (-94.69%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Text Detection Using Yolo Algorithm In Keras Tensorflow
Implemented the YOLO algorithm for scene text detection in keras-tensorflow (No object detection API used) The code can be tweaked to train for a different object detection task using YOLO.
Stars: ✭ 87 (-96.45%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Deep Residual Unet
ResUNet, a semantic segmentation model inspired by the deep residual learning and UNet. An architecture that take advantages from both(Residual and UNet) models.
Stars: ✭ 97 (-96.04%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-94.77%)
Mutual labels:  jupyter-notebook, apache-spark
Sru Deeplearning Workshop
دوره 12 ساعته یادگیری عمیق با چارچوب Keras
Stars: ✭ 66 (-97.3%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Spark Tpc Ds Performance Test
Use the TPC-DS benchmark to test Spark SQL performance
Stars: ✭ 133 (-94.57%)
Mutual labels:  jupyter-notebook, apache-spark
Stock Price Predictor
This project seeks to utilize Deep Learning models, Long-Short Term Memory (LSTM) Neural Network algorithm, to predict stock prices.
Stars: ✭ 146 (-94.04%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-93.87%)
Mutual labels:  jupyter-notebook, apache-spark
Automatic Image Captioning
Generating Captions for images using Deep Learning
Stars: ✭ 84 (-96.57%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Unet Tgs
Applying UNET Model on TGS Salt Identification Challenge hosted on Kaggle
Stars: ✭ 81 (-96.69%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Deepeeg
Deep Learning with Tensor Flow for EEG MNE Epoch Objects
Stars: ✭ 100 (-95.92%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Tf Serving K8s Tutorial
A Tutorial for Serving Tensorflow Models using Kubernetes
Stars: ✭ 78 (-96.81%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Self Driving Car
A End to End CNN Model which predicts the steering wheel angle based on the video/image
Stars: ✭ 106 (-95.67%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (-93.26%)
Mutual labels:  jupyter-notebook, apache-spark
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-97.71%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Pneumonia Detection From Chest X Ray Images With Deep Learning
Detecting Pneumonia in Chest X-ray Images using Convolutional Neural Network and Pretrained Models
Stars: ✭ 64 (-97.39%)
Mutual labels:  jupyter-notebook, keras-tensorflow
Repo 2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (-94.57%)
Mutual labels:  jupyter-notebook, keras-tensorflow


Distributed TensorFlow, PyTorch, Keras and BigDL on Apache Spark & Ray


Analytics Zoo is an open source Big Data AI platform, and includes the following features for scaling end-to-end AI to distributed Big Data:

  • Orca: seamlessly scale out TensorFlow and PyTorch for Big Data (using Spark & Ray)

  • RayOnSpark: run Ray programs directly on Big Data clusters

  • BigDL Extensions: high-level Spark ML pipeline and Keras-like APIs for BigDL

  • Chronos: scalable time series analysis using AutoML

  • PPML: privacy preserving big data analysis and machine learning (experimental)

For more information, you may read the docs.


Installing

You can use Analytics Zoo on Google Colab without any installation. Analytics Zoo also includes a set of notebooks that you can directly open and run in Colab.

To install Analytics Zoo, we recommend using conda environments.

conda create -n my_env 
conda activate my_env
pip install analytics-zoo 

To install latest nightly build, use pip install --pre --upgrade analytics-zoo; see Python and Scala user guide for more details.

Getting Started with Orca

Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The Orca library seamlessly scales out your single node TensorFlow or PyTorch notebook across large clusters (so as to process distributed Big Data).

First, initialize Orca Context:

from zoo.orca import init_orca_context

# cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2) 

Next, perform data-parallel processing in Orca (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.):

from pyspark.sql.functions import array

df = spark.read.parquet(file_path)
df = df.withColumn('user', array('user')) \  
       .withColumn('item', array('item'))

Finally, use sklearn-style Estimator APIs in Orca to perform distributed TensorFlow, PyTorch or Keras training and inference:

from tensorflow import keras
from zoo.orca.learn.tf.estimator import Estimator

user = keras.layers.Input(shape=[1])  
item = keras.layers.Input(shape=[1])  
feat = keras.layers.concatenate([user, item], axis=1)  
predictions = keras.layers.Dense(2, activation='softmax')(feat)  
model = keras.models.Model(inputs=[user, item], outputs=predictions)  
model.compile(optimizer='rmsprop',  
              loss='sparse_categorical_crossentropy',  
              metrics=['accuracy'])

est = Estimator.from_keras(keras_model=model)  
est.fit(data=df,  
        batch_size=64,  
        epochs=4,  
        feature_cols=['user', 'item'],  
        label_cols=['label'])

See TensorFlow and PyTorch quickstart, as well as the document website, for more details.

Getting Started with RayOnSpark

Ray is an open source distributed framework for emerging AI applications. RayOnSpark allows users to directly run Ray programs on existing Big Data clusters, and directly write Ray code inline with their Spark code (so as to process the in-memory Spark RDDs or DataFrames).

from zoo.orca import init_orca_context

# cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True) 

import ray

@ray.remote
class Counter(object):
      def __init__(self):
          self.n = 0

      def increment(self):
          self.n += 1
          return self.n

counters = [Counter.remote() for i in range(5)]
print(ray.get([c.increment.remote() for c in counters]))

See the RayOnSpark user guide and quickstart for more details.

Getting Started with BigDL Extensions

Analytics Zoo makes it easier to develop large-scale deep learning applications on Apache Spark, by providing high-level Spark ML pipeline and Keras-like APIs on top of BigDL (a distributed deep learning framework for Spark).

First, call initNNContext at the beginning of the code:

import com.intel.analytics.zoo.common.NNContext
val sc = NNContext.initNNContext()

Then, define the BigDL model using Keras-style API:

val input = Input[Float](inputShape = Shape(10))  
val dense = Dense[Float](12).inputs(input)  
val output = Activation[Float]("softmax").inputs(dense)  
val model = Model(input, output)

After that, use NNEstimator to train/predict/evaluate the model using Spark Dataframes and ML pipelines:

val trainingDF = spark.read.parquet("train_data")
val validationDF = spark.read.parquet("val_data")
val scaler = new MinMaxScaler().setInputCol("in").setOutputCol("value")
val estimator = NNEstimator(model, CrossEntropyCriterion())  
        .setBatchSize(size).setOptimMethod(new Adam()).setMaxEpoch(epoch)
val pipeline = new Pipeline().setStages(Array(scaler, estimator))

val pipelineModel = pipeline.fit(trainingDF)  
val predictions = pipelineModel.transform(validationDF)

See the Scala, NNframes and Keras API user guides for more details.

Getting Started with Chronos

Time series prediction takes observations from previous time steps as input and predicts the values at future time steps. The Chronos library makes it easy to build end-to-end time series analysis by applying AutoML to extremely large-scale time series prediction.

To train a time series model with AutoML, first initialize Orca Context:

from zoo.orca import init_orca_context

#cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True)

Next, create an AutoTSTrainer.

from zoo.chronos.autots.deprecated.forecast import AutoTSTrainer

trainer = AutoTSTrainer(dt_col="datetime", target_col="value")

Finally, call fit on AutoTSTrainer, which applies AutoML to find the best model and hyper-parameters; it returns a TSPipeline which can be used for prediction or evaluation.

#train a pipeline with AutoML support
ts_pipeline = trainer.fit(train_df, validation_df)

#predict
ts_pipeline.predict(test_df)

See the Chronos user guide and example for more details.

PPML (Privacy Preserving Machine Learning)

Analytics Zoo PPML provides a Trusted Cluster Environment for protecting the end-to-end Big Data AI pipeline. It combines various low level hardware and software security technologies (e.g., Intel SGX, LibOS such as Graphene and Occlum, Federated Learning, etc.), and allows users to run unmodified Big Data analysis and ML/DL programs (such as Apache Spark, Apache Flink, Tensorflow, PyTorch, etc.) in a secure fashion on (private or public) cloud.

See the PPML user guide for more details.

More information

Older Documents

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].