All Projects → airoldilab → sgd

airoldilab / sgd

Licence: other
An R package for large scale estimation with stochastic gradient descent

Programming Languages

C++
36643 projects - #6 most used programming language
r
7636 projects
c
50402 projects - #5 most used programming language
objective c
16641 projects - #2 most used programming language

Projects that are alternatives of or similar to sgd

GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-9.09%)
Mutual labels:  big-data, gradient-descent
predictionio-sdk-python
PredictionIO Python SDK
Stars: ✭ 199 (+261.82%)
Mutual labels:  big-data
Detecting-Malicious-URL-Machine-Learning
No description or website provided.
Stars: ✭ 47 (-14.55%)
Mutual labels:  big-data
incubator-tez
Mirror of Apache Tez (Incubating)
Stars: ✭ 60 (+9.09%)
Mutual labels:  big-data
masc
Microsoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-63.64%)
Mutual labels:  big-data
bullet-core
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-34.55%)
Mutual labels:  big-data
Clickhouse
ClickHouse® is a free analytics DBMS for big data
Stars: ✭ 21,089 (+38243.64%)
Mutual labels:  big-data
ytpriv
YT metadata exporter
Stars: ✭ 28 (-49.09%)
Mutual labels:  big-data
accumulo-docker
Apache Accumulo Docker
Stars: ✭ 17 (-69.09%)
Mutual labels:  big-data
bagri
XML/Document DB on top of distributed cache
Stars: ✭ 40 (-27.27%)
Mutual labels:  big-data
Social-Network-Analysis-in-Python
Social Network Facebook Analysis (Python, Networkx)
Stars: ✭ 26 (-52.73%)
Mutual labels:  big-data
predictionio-sdk-ruby
PredictionIO Ruby SDK
Stars: ✭ 192 (+249.09%)
Mutual labels:  big-data
machine learning course
Artificial intelligence/machine learning course at UCF in Spring 2020 (Fall 2019 and Spring 2019)
Stars: ✭ 47 (-14.55%)
Mutual labels:  gradient-descent
acousticbrainz-server
The server components for the AcousticBrainz project
Stars: ✭ 128 (+132.73%)
Mutual labels:  big-data
scikit-learn-intelex
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Stars: ✭ 887 (+1512.73%)
Mutual labels:  big-data
predictionio-template-recommender
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 80 (+45.45%)
Mutual labels:  big-data
flatiron-school-data-science-curriculum-resources
Lesson material on data science and machine learning topics/concepts
Stars: ✭ 118 (+114.55%)
Mutual labels:  gradient-descent
accumulo-testing
Apache Accumulo Testing
Stars: ✭ 14 (-74.55%)
Mutual labels:  big-data
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-63.64%)
Mutual labels:  big-data
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+129.09%)
Mutual labels:  big-data

sgd

sgd is an R package for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.

Features

At the core of the package is the function

sgd(formula, data, model, model.control, sgd.control)

It estimates parameters for a given data set and model using stochastic gradient descent. The optional arguments model.control and sgd.control specify attributes about the model and stochastic gradient method. Taking advantage of the bigmemory package, sgd also operates on data sets which are too large to fit in RAM as well as streaming data.

Example of large-scale linear regression:

library(sgd)

# Dimensions
N <- 1e5  # number of data points
d <- 1e2  # number of features

# Generate data.
X <- matrix(rnorm(N*d), ncol=d)
theta <- rep(5, d+1)
eps <- rnorm(N)
y <- cbind(1, X) %*% theta + eps
dat <- data.frame(y=y, x=X)

sgd.theta <- sgd(y ~ ., data=dat, model="lm")

Any loss function may be specified. For convenience the following are built-in:

  • Linear models
  • Generalized linear models
  • Method of moments
  • Generalized method of moments
  • Cox proportional hazards model
  • M-estimation

The following stochastic gradient methods exist:

  • (Standard) stochastic gradient descent
  • Implicit stochastic gradient descent
  • Averaged stochastic gradient descent
  • Averaged implicit stochastic gradient descent
  • Classical momentum
  • Nesterov's accelerated gradient

Check out the vignette in vignettes/ or examples in demo/. In R, the equivalent commands are vignette(package="sgd") and demo(package="sgd").

Installation

To install the latest version from CRAN:

install.packages("sgd")

To install the latest development version from Github:

# install.packages("devtools")
devtools::install_github("airoldilab/sgd")

Authors

sgd is written by Dustin Tran and Panos Toulis, and is under active development. Please feel free to contribute by submitting any issues or requests—or by solving any current issues!

We thank all other members of the Airoldi Lab (led by Prof. Edo Airoldi) for their feedback and contributions.

Citation

@article{tran2015stochastic,
  author = {Tran, Dustin and Toulis, Panos and Airoldi, Edoardo M},
  title = {Stochastic gradient descent methods for estimation with large data sets},
  journal = {arXiv preprint arXiv:1509.06459},
  year = {2015}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].