All Projects → apache → Mahout

apache / Mahout

Licence: apache-2.0
Mirror of Apache Mahout

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects
perl
6916 projects
Raku
181 projects
shell
77523 projects
HTML
75241 projects

Projects that are alternatives of or similar to Mahout

React Rollup Boilerplate
Boilerplate for creating React component libraries, bundled with Rollup.js to ES6 Modules, React Styleguidist, Typescript
Stars: ✭ 157 (-91.99%)
Mutual labels:  library
Pipeline Library
Collection of custom steps and variables for our Jenkins instance(s)
Stars: ✭ 159 (-91.89%)
Mutual labels:  library
Npf
NPF: packet filter with stateful inspection, NAT, IP sets, etc.
Stars: ✭ 160 (-91.84%)
Mutual labels:  library
Hyperion Ios
In-app design review tool to inspect measurements, attributes, and animations.
Stars: ✭ 1,964 (+0.2%)
Mutual labels:  library
Passw0rd
🔑securely checks a password to see if it has been previously exposed in a data breach
Stars: ✭ 159 (-91.89%)
Mutual labels:  library
Sc
Common libraries and data structures for C.
Stars: ✭ 161 (-91.79%)
Mutual labels:  library
Game
Java 2D game library
Stars: ✭ 157 (-91.99%)
Mutual labels:  library
Kethereum
Kotlin library for Ethereum
Stars: ✭ 161 (-91.79%)
Mutual labels:  library
Java Markdown Generator
Java library to generate markdown
Stars: ✭ 159 (-91.89%)
Mutual labels:  library
Libosmscout
Libosmscout is a C++ library for offline map rendering, routing and location lookup based on OpenStreetMap data
Stars: ✭ 159 (-91.89%)
Mutual labels:  library
Qr Code Generator
High-quality QR Code generator library in Java, TypeScript/JavaScript, Python, Rust, C++, C.
Stars: ✭ 2,363 (+20.56%)
Mutual labels:  library
Android Ui 2019
Android Library 2019 Loading.....
Stars: ✭ 159 (-91.89%)
Mutual labels:  library
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-91.94%)
Mutual labels:  library
100 Lines Of Code Challenge Js
Write Everything in JavaScript under 100 Lines!!!😈
Stars: ✭ 157 (-91.99%)
Mutual labels:  library
Spatial audio framework
A cross-platform framework for developing spatial audio related applications in C/C++
Stars: ✭ 161 (-91.79%)
Mutual labels:  library
Ffmediatoolkit
FFMediaToolkit is a cross-platform video decoder/encoder library for .NET that uses FFmpeg native libraries. It supports video frames extraction, reading stream metadata and creating videos from bitmaps in any format supported by FFmpeg.
Stars: ✭ 156 (-92.04%)
Mutual labels:  library
Dublin Traceroute
Dublin Traceroute is a NAT-aware multipath tracerouting tool
Stars: ✭ 159 (-91.89%)
Mutual labels:  library
React Timelines
React Timelines Library
Stars: ✭ 161 (-91.79%)
Mutual labels:  library
Campcotcollectionview
Collapse and expand UICollectionView sections with one method call.
Stars: ✭ 161 (-91.79%)
Mutual labels:  library
Zgallery
Android 3rd party library to make implementing galleries more easier
Stars: ✭ 160 (-91.84%)
Mutual labels:  library

Welcome to Apache Mahout!

The goal of the Apache Mahout™ project is to build an environment for quickly creating scalable, performant machine learning applications.

For additional information about Mahout, visit the Mahout Home Page

Setting up your Environment

Whether you are using the Mahout- shell, running command line jobs, or using it as a library to build apps, you will need to set-up several environment variables. Edit your environment in ~/.bash_profile for Mac or ~/.bashrc for many Linux distributions. Add the following

export MAHOUT_HOME=/path/to/mahout
export MAHOUT_LOCAL=true # for running standalone on your dev machine, 
# unset MAHOUT_LOCAL for running on a cluster

You will need $JAVA_HOME, and if you are running on Spark, you will also need $SPARK_HOME.

Using Mahout as a Library

Running any application that uses Mahout will require installing a binary or source version and setting the environment. To compile from source:

  • mvn -DskipTests clean install
  • To run tests do mvn test
  • To set up your IDE, do mvn eclipse:eclipse or mvn idea:idea

To use Maven, add the appropriate setting to your pom.xml or build.sbt following the template below.

To use the Samsara environment you'll need to include both the engine neutral math-scala dependency:

<dependency>
    <groupId>org.apache.mahout</groupId>
    <artifactId>mahout-math-scala</artifactId>
    <version>${mahout.version}</version>
</dependency>

and a dependency for back end engine translation, e.g:

<dependency>
    <groupId>org.apache.mahout</groupId>
    <artifactId>mahout-spark</artifactId>
    <version>${mahout.version}</version>
</dependency>

Building From Source

Prerequisites:

Linux Environment (preferably Ubuntu 16.04.x) Note: Currently, only the JVM-only build will work on a Mac. gcc > 4.x NVIDIA Card (installed with OpenCL drivers alongside usual GPU drivers)

Downloads

Install java 1.7+ in an easily accessible directory (for this example, ~/java/) http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Create a directory ~/apache/.

Download apache Maven 3.3.9 and un-tar/gunzip to ~/apache/apache-maven-3.3.9/ . https://maven.apache.org/download.cgi

Download and un-tar/gunzip Hadoop 2.4.1 to ~/apache/hadoop-2.4.1/ . https://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/

Download and un-tar/gunzip spark-1.6.3-bin-hadoop2.4 to ~/apache/ . http://spark.apache.org/downloads.html Choose release: Spark-1.6.3 (Nov 07 2016) Choose a package type: Pre-Built for Hadoop 2.4

Install ViennaCL 1.7.0+ If running Ubuntu 16.04+

sudo apt-get install libviennacl-dev

Otherwise if your distribution’s package manager does not have a viennniacl-dev package >1.7.0, clone it directly into the directory which will be included in when being compiled by Mahout:

mkdir ~/tmp
cd ~/tmp && git clone https://github.com/viennacl/viennacl-dev.git
cp -r viennacl/ /usr/local/
cp -r CL/ /usr/local/

Ensure that the OpenCL 1.2+ drivers are all installed (packed with most consumer-grade NVIDIA drivers). Not sure about higher-end cards.

Clone mahout repository into ~/apache.

git clone https://github.com/apache/mahout.git
Configuration

When building mahout for a spark backend, we need four System Environment variables set:

    export MAHOUT_HOME=/home/<user>/apache/mahout
    export HADOOP_HOME=/home/<user>/apache/hadoop-2.4.1
    export SPARK_HOME=/home/<user>/apache/spark-1.6.3-bin-hadoop2.4    
    export JAVA_HOME=/home/<user>/java/jdk-1.8.121

Mahout on Spark regularly uses one more env variable, the IP of the Spark clusters' master node (usually, the node hosting the session user).

To use four local cores (Spark master need not be running)

export MASTER=local[4]

To use all available local cores (again, Spark master need not be running)

export MASTER=local[*]

To point to a cluster with spark running:

export MASTER=spark://master.ip.address:7077

We then add these to the path:

   PATH=$PATH$:MAHOUT_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$JAVA_HOME/bin

These get appended to the users' ~/.bashrc file.

Building Mahout with Apache Maven

Currently, Mahout has three builds. From the $MAHOUT_HOME directory, we may issue the commands to build each using mvn profiles.

JVM only:

mvn clean install -DskipTests

JVM with native OpenMP level 2 and level 3 matrix/vector Multiplication

mvn clean install -Pviennacl-omp -Phadoop2 -DskipTests

JVM with native OpenMP and OpenCL for Level 2 and level 3 matrix/vector Multiplication. (GPU errors fall back to OpenMP, and currently, only a single GPU/node is supported).

mvn clean install -Pviennacl -Phadoop2 -DskipTests

Testing the Mahout Environment

Mahout provides an extension to the spark-shell that is good for getting to know the language, testing partition loads, prototyping algorithms, etc.

To launch the shell in local mode with two threads - simply do the following:

$ MASTER=local[2] mahout spark-shell

After a very verbose startup, a Mahout welcome screen will appear:

Loading /home/andy/sandbox/apache-mahout-distribution-0.13.0/bin/load-shell.scala...
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._
sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@3ca1f0a4

                _                 _
_ __ ___   __ _| |__   ___  _   _| |_
 '_ ` _ \ / _` | '_ \ / _ \| | | | __|
 | | | | (_| | | | | (_) | |_| | |_
_| |_| |_|\__,_|_| |_|\___/ \__,_|\__|  version 0.13.0


That file does not exist


scala>

At the scala> prompt, enter:

scala> :load /home/<andy>/apache/mahout/examples
                               /bin/SparseSparseDrmTimer.mscala

Which will load a matrix multiplication timer function definition. To run the matrix timer:

        scala> timeSparseDRMMMul(1000,1000,1000,1,.02,1234L)
            {...} res3: Long = 16321

Note the 14.1 release is missing a class required for this will be fixed in 14.2. We can see that the JVM only version is slow, thus our motive for GPU and Native Multithreading support.

To understand the processes getting performed under the hood of the timer, we may examine the .mscala (mahout scala) code that is both fully functional scala and the Mahout R-Like DSL for tensor algebra:




def timeSparseDRMMMul(m: Int, n: Int, s: Int, para: Int, pctDense: Double = .20, seed: Long = 1234L): Long = {
  val drmA = drmParallelizeEmpty(m , s, para).mapBlock(){
       case (keys,block:Matrix) =>
           val R =  scala.util.Random
           R.setSeed(seed)
           val blockB = new SparseRowMatrix(block.nrow, block.ncol)
           blockB := {x => if (R.nextDouble < pctDense) R.nextDouble else x }
       (keys -> blockB)
  }

  val drmB = drmParallelizeEmpty(s , n, para).mapBlock(){
       case (keys,block:Matrix) =>
           val R =  scala.util.Random
           R.setSeed(seed + 1)
           val blockB = new SparseRowMatrix(block.nrow, block.ncol)
           blockB := {x => if (R.nextDouble < pctDense) R.nextDouble else x }
       (keys -> blockB)
  }

  var time = System.currentTimeMillis()

  val drmC = drmA %*% drmB
 
  // trigger computation
  drmC.numRows()

  time = System.currentTimeMillis() - time

  time  
 
}

For more information, please see the following references:

http://mahout.apache.org/users/environment/in-core-reference.html

http://mahout.apache.org/users/environment/out-of-core-reference.html

http://mahout.apache.org/users/sparkbindings/play-with-shell.html

http://mahout.apache.org/users/environment/classify-a-doc-from-the-shell.html

Note that due to an intermittent out-of-memory bug in a Flink-based test, we have disabled it from the binary releases. To use Flink, please uncomment the line in the root pom.xml in the <modules> block, so it reads <module>flink</module>.

Examples

For examples of how to use Mahout, see the examples directory located in examples/bin

For information on how to contribute, visit the How to Contribute Page

Legal

Please see the NOTICE.txt included in this directory for more information.

Build Status

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].