All Projects → salesforce → Transmogrifai

salesforce / Transmogrifai

Licence: bsd-3-clause
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Transmogrifai

Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-59.02%)
Mutual labels:  ai, automl, feature-engineering, automated-machine-learning
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+23.61%)
Mutual labels:  features, spark, ml, feature-engineering
Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (-25.19%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-98.37%)
Mutual labels:  feature-engineering, automl, automated-machine-learning
Autogluon
AutoGluon: AutoML for Text, Image, and Tabular Data
Stars: ✭ 3,920 (+88.1%)
Mutual labels:  automl, automated-machine-learning, structured-data
featuretoolsOnSpark
A simplified version of featuretools for Spark
Stars: ✭ 24 (-98.85%)
Mutual labels:  feature-engineering, automl, automated-machine-learning
AutoTabular
Automatic machine learning for tabular data. ⚡🔥⚡
Stars: ✭ 51 (-97.55%)
Mutual labels:  structured-data, feature-engineering, automl
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+42.32%)
Mutual labels:  ai, pipelines, ml
Autofeat
Linear Prediction Model with Automated Feature Engineering and Selection Capabilities
Stars: ✭ 178 (-91.46%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
Featuretools
An open source python library for automated feature engineering
Stars: ✭ 5,891 (+182.68%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (-68.91%)
Mutual labels:  ai, ml, feature-engineering
EvolutionaryForest
An open source python library for automated feature engineering based on Genetic Programming
Stars: ✭ 56 (-97.31%)
Mutual labels:  feature-engineering, automl, automated-machine-learning
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+39.11%)
Mutual labels:  ai, spark, ml
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+413.34%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (-90.6%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
Remixautoml
R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.
Stars: ✭ 159 (-92.37%)
Mutual labels:  transformations, feature-engineering, automated-machine-learning
Machinejs
[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
Stars: ✭ 412 (-80.23%)
Mutual labels:  ml, automl, automated-machine-learning
Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (-53.89%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
Tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+302.02%)
Mutual labels:  automl, feature-engineering, automated-machine-learning
Xlearning Xdml
extremely distributed machine learning
Stars: ✭ 113 (-94.58%)
Mutual labels:  ai, spark

TransmogrifAI

Maven Central Javadocs Spark version Scala version License Chat

TravisCI Build Status CircleCI Build Status Documentation Status CII Best Practices CodeFactor

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Apache Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.

Use TransmogrifAI if you need a machine learning library to:

  • Build production ready machine learning applications in hours, not months
  • Build machine learning models without getting a Ph.D. in machine learning
  • Build modular, reusable, strongly typed machine learning workflows

To understand the motivation behind TransmogrifAI check out these:

Skip to Quick Start and Documentation.

Predicting Titanic Survivors with TransmogrifAI

The Titanic dataset is an often-cited dataset in the machine learning community. The goal is to build a machine learnt model that will predict survivors from the Titanic passenger manifest. Here is how you would build the model using TransmogrifAI:

import com.salesforce.op._
import com.salesforce.op.readers._
import com.salesforce.op.features._
import com.salesforce.op.features.types._
import com.salesforce.op.stages.impl.classification._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

implicit val spark = SparkSession.builder.config(new SparkConf()).getOrCreate()
import spark.implicits._

// Read Titanic data as a DataFrame
val passengersData = DataReaders.Simple.csvCase[Passenger](path = pathToData).readDataset().toDF()

// Extract response and predictor Features
val (survived, predictors) = FeatureBuilder.fromDataFrame[RealNN](passengersData, response = "survived")

// Automated feature engineering
val featureVector = predictors.transmogrify()

// Automated feature validation and selection
val checkedFeatures = survived.sanityCheck(featureVector, removeBadFeatures = true)

// Automated model selection
val pred = BinaryClassificationModelSelector().setInput(survived, checkedFeatures).getOutput()

// Setting up a TransmogrifAI workflow and training the model
val model = new OpWorkflow().setInputDataset(passengersData).setResultFeatures(pred).train()

println("Model summary:\n" + model.summaryPretty())

Model summary:

Evaluated Logistic Regression, Random Forest models with 3 folds and AuPR metric.
Evaluated 3 Logistic Regression models with AuPR between [0.6751930383321765, 0.7768725281794376]
Evaluated 16 Random Forest models with AuPR between [0.7781671467343991, 0.8104798040316159]

Selected model Random Forest classifier with parameters:
|-----------------------|--------------|
| Model Param           |     Value    |
|-----------------------|--------------|
| modelType             | RandomForest |
| featureSubsetStrategy |         auto |
| impurity              |         gini |
| maxBins               |           32 |
| maxDepth              |           12 |
| minInfoGain           |        0.001 |
| minInstancesPerNode   |           10 |
| numTrees              |           50 |
| subsamplingRate       |          1.0 |
|-----------------------|--------------|

Model evaluation metrics:
|-------------|--------------------|---------------------|
| Metric Name | Hold Out Set Value |  Training Set Value |
|-------------|--------------------|---------------------|
| Precision   |               0.85 |   0.773851590106007 |
| Recall      | 0.6538461538461539 |  0.6930379746835443 |
| F1          | 0.7391304347826088 |  0.7312186978297163 |
| AuROC       | 0.8821603927986905 |  0.8766642291593114 |
| AuPR        | 0.8225075757571668 |   0.850331080886535 |
| Error       | 0.1643835616438356 | 0.19682151589242053 |
| TP          |               17.0 |               219.0 |
| TN          |               44.0 |               438.0 |
| FP          |                3.0 |                64.0 |
| FN          |                9.0 |                97.0 |
|-------------|--------------------|---------------------|

Top model insights computed using correlation:
|-----------------------|----------------------|
| Top Positive Insights |      Correlation     |
|-----------------------|----------------------|
| sex = "female"        |   0.5177801026737666 |
| cabin = "OTHER"       |   0.3331391338844782 |
| pClass = 1            |   0.3059642953159715 |
|-----------------------|----------------------|
| Top Negative Insights |      Correlation     |
|-----------------------|----------------------|
| sex = "male"          |  -0.5100301587292186 |
| pClass = 3            |  -0.5075774968534326 |
| cabin = null          | -0.31463114463832633 |
|-----------------------|----------------------|

Top model insights computed using CramersV:
|-----------------------|----------------------|
|      Top Insights     |       CramersV       |
|-----------------------|----------------------|
| sex                   |    0.525557139885501 |
| embarked              |  0.31582347194683386 |
| age                   |  0.21582347194683386 |
|-----------------------|----------------------|

While this may seem a bit too magical, for those who want more control, TransmogrifAI also provides the flexibility to completely specify all the features being extracted and all the algorithms being applied in your ML pipeline. Visit our docs site for full documentation, getting started, examples, faq and other information.

Adding TransmogrifAI into your project

You can simply add TransmogrifAI as a regular dependency to an existing project. Start by picking TransmogrifAI version to match your project dependencies from the version matrix below (if not sure - take the stable version):

TransmogrifAI Version Spark Version Scala Version Java Version
0.7.1 (unreleased, master), 0.7.0 (stable) 2.4 2.11 1.8
0.6.1, 0.6.0, 0.5.3, 0.5.2, 0.5.1, 0.5.0 2.3 2.11 1.8
0.4.0, 0.3.4 2.2 2.11 1.8

For Gradle in build.gradle add:

repositories {
    jcenter()
    mavenCentral()
}
dependencies {
    // TransmogrifAI core dependency
    compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.7.0'

    // TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
    // compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.7.0'
}

For SBT in build.sbt add:

scalaVersion := "2.11.12"

resolvers += Resolver.jcenterRepo

// TransmogrifAI core dependency
libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.7.0"

// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.7.0"

Then import TransmogrifAI into your code:

// TransmogrifAI functionality: feature types, feature builders, feature dsl, readers, aggregators etc.
import com.salesforce.op._
import com.salesforce.op.aggregators._
import com.salesforce.op.features._
import com.salesforce.op.features.types._
import com.salesforce.op.readers._

// Spark enrichments (optional)
import com.salesforce.op.utils.spark.RichDataset._
import com.salesforce.op.utils.spark.RichRDD._
import com.salesforce.op.utils.spark.RichRow._
import com.salesforce.op.utils.spark.RichMetadata._
import com.salesforce.op.utils.spark.RichStructType._

Quick Start and Documentation

Visit our docs site for full documentation, getting started, examples, faq and other information.

See scaladoc for the programming API.

Authors

Internal Contributors (prior to release)

License

BSD 3-Clause © Salesforce.com, Inc.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].