All Projects → ksopyla → tensorflow-mnist-convnets

ksopyla / tensorflow-mnist-convnets

Licence: MIT license
Neural nets for MNIST classification, simple single layer NN, 5 layer FC NN and convolutional neural networks with different architectures

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tensorflow-mnist-convnets

fast-tsetlin-machine-with-mnist-demo
A fast Tsetlin Machine implementation employing bit-wise operators, with MNIST demo.
Stars: ✭ 58 (+163.64%)
Mutual labels:  mnist
MNIST-TFLite
MNIST classifier built for TensorFlow Lite - Android, iOS and other "lite" platforms
Stars: ✭ 34 (+54.55%)
Mutual labels:  mnist
SimpNet-Tensorflow
A Tensorflow Implementation of the SimpNet Convolutional Neural Network Architecture
Stars: ✭ 16 (-27.27%)
Mutual labels:  mnist
playing with vae
Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST / FMNIST
Stars: ✭ 53 (+140.91%)
Mutual labels:  mnist
image-defect-detection-based-on-CNN
TensorBasicModel
Stars: ✭ 17 (-22.73%)
Mutual labels:  mnist
catseye
Neural network library written in C and Javascript
Stars: ✭ 29 (+31.82%)
Mutual labels:  mnist
BP-Network
Multi-Classification on dataset of MNIST
Stars: ✭ 72 (+227.27%)
Mutual labels:  mnist
deeplearning-mpo
Replace FC2, LeNet-5, VGG, Resnet, Densenet's full-connected layers with MPO
Stars: ✭ 26 (+18.18%)
Mutual labels:  mnist
PaperSynth
Handwritten text to synths!
Stars: ✭ 18 (-18.18%)
Mutual labels:  mnist
crohme-data-extractor
A modified extractor for the CROHME handwritten math symbols dataset.
Stars: ✭ 18 (-18.18%)
Mutual labels:  mnist
MNIST
Handwritten digit recognizer using a feed-forward neural network and the MNIST dataset of 70,000 human-labeled handwritten digits.
Stars: ✭ 28 (+27.27%)
Mutual labels:  mnist
MNIST-adversarial-images
Create adversarial images to fool a MNIST classifier in TensorFlow
Stars: ✭ 13 (-40.91%)
Mutual labels:  mnist
gan-vae-pretrained-pytorch
Pretrained GANs + VAEs + classifiers for MNIST/CIFAR in pytorch.
Stars: ✭ 134 (+509.09%)
Mutual labels:  mnist
digdet
A realtime digit OCR on the browser using Machine Learning
Stars: ✭ 22 (+0%)
Mutual labels:  mnist
digitrecognition ios
Deep Learning with Tensorflow/Keras: Digit recognition based on mnist-dataset and convolutional neural-network on iOS with CoreML
Stars: ✭ 23 (+4.55%)
Mutual labels:  mnist
LeNet-from-Scratch
Implementation of LeNet5 without any auto-differentiate tools or deep learning frameworks. Accuracy of 98.6% is achieved on MNIST dataset.
Stars: ✭ 22 (+0%)
Mutual labels:  mnist
numpy-neuralnet-exercise
Implementation of key concepts of neuralnetwork via numpy
Stars: ✭ 49 (+122.73%)
Mutual labels:  mnist
mnist-flask
A Flask web app for handwritten digit recognition using machine learning
Stars: ✭ 34 (+54.55%)
Mutual labels:  mnist
keras gpyopt
Using Bayesian Optimization to optimize hyper parameter in Keras-made neural network model.
Stars: ✭ 56 (+154.55%)
Mutual labels:  mnist
cuda-neural-network
Convolutional Neural Network with CUDA (MNIST 99.23%)
Stars: ✭ 118 (+436.36%)
Mutual labels:  mnist

Tensorflow MNIST Convolutional Network Tutorial

This project is another tutorial for teaching you Artificial Neural Networks. I hope that my way of presenting the material will help you in the long learning process. All the examples are presented in TensorFlow and as a runtime environment

Project presents four different neural nets for MNIST digit classification. The former two are fully connected neural networks and latter are convolutional networks. Each network is built on top of the previous example with gradually increasing difficulty in order to learn more powerful models.

Project was implemented in Python 3 and Tensorflow v.1.0.0

Tensorflow neural network examples

  • simple single layer neural network (one fully-connected layer),
  • 5 layer Fully-connected neural network (5 FC NN) in 3 variants
  • convolutional neural netowork: 3x convNet+1FC+output - activation function sigmoid
  • convolutional neural netowork with dropout, relu, and better weight initialization: 3x convNet+1FC+output

Single layer neural network

This is the simplest architecture that we will consider. This feedforward neural network will be our baseline model for further more powerful solutions. We start with simple model in order to lay Tensorflow foundations:

File: minist_1.0_single_layer_nn.py

Network architecture

This is simple one layer feedforward network with one input layer and one output layer

  • Input layer 28*28= 784,
  • Output 10 dim vector (10 digits, one-hot encoding)
input layer             - X[batch, 784]
Fully connected         - W[784,10] + b[10]
One-hot encoded labels  - Y[batch, 10]

Model

Y = softmax(X*W+b)
Matrix mul: X*W - [batch,784]x[784,10] -> [batch,10]

Training consists of finding good W elements, this is handled automatically by Tensorflow Gradient Descent optimizer.

Results

This simple model achieves 0.9237 accuracy

Tensorflow MNIST train/test loss and accuracy for one-layer neural network

Five layers fully-connected neural network

This is upgraded version of the previous model, between input and output we added five fully connected hidden layers. Adding more layers makes the network more expressive but in the same time harder to train. The three new problems could emerge vanishing gradients, model overfitting, and computation time complexity. In our case where the dataset is rather small, we did not see those problems in real scale.

In order to deal with those problems, different training techniques were invented. Changing from sigmoid to RELU activation function will prevent vanishing gradients, choosing Adam optimizer will speed up optimization and in the same time shorten training time, adding dropout will help with overfitting.

This model was implemented in three variants, where each successive variant builds on previous one and add some new features:

  • Variant 1 is simple fully connected network with sigmoid activation function and Gradient descent optimizer
  • Variant 2 use more powerful RELU activation function instead sigmoid and utilize better Adam optimizer
  • Variant 2 add dropout usage in order to prevent overfitting

Network architecture

All variants share the same network architecture, all have five layers with sizes given below:

input layer             - X[batch, 784]
1 layer                 - W1[784, 200] + b1[200]
                          Y1[batch, 200] 
2 layer                 - W2[200, 100] + b2[100]
                          Y2[batch, 200] 
3 layer                 - W3[100, 60]  + b3[60]
                          Y3[batch, 200] 
4 layer                 - W4[60, 30]   + b4[30]
                          Y4[batch, 30] 
5 layer                 - W5[30, 10]   + b5[10]
One-hot encoded labels    Y5[batch, 10]

model
Y = softmax(X*W+b)
Matrix mul: X*W - [batch,784]x[784,10] -> [batch,10]

Results

All results are for 5k iteration.

  • five layer fully-connected : accuracy=0.9541
  • five layer fully-connected with relu activation function and Adam optmizer: accuracy=0.9817
  • five layer fully-connected with relu activation, Adam optmizer and dropout: accuracy=0.9761

Tensorflow MNIST train/test loss and accuracy for 5 layers fully connected network

Tensorflow MNIST train/test loss and accuracy for 5 layers fully connected network (RELU, Adam optimizer)

Tensorflow MNIST train/test loss and accuracy for 5 layers fully connected network (RELU, Adam optimizer, dropout)

As we can see changing from sigmoid to RELU activation and use Adam optimizer increase accuracy over 2.5%, which is significant for such small change. However, adding dropout decrease, but if we compare test loss graphs we can notice that dropout decrease the final test accuracy, but the test accuracy graph is much smoother.

Convolutional neural network

Network architecture

The network layout is as follows:

5 layer neural network with 3 convolution layers, input layer 28*28= 784, output 10 (10 digits)
Output labels uses one-hot encoding
input layer               - X[batch, 784]
1 conv. layer             - W1[5,5,,1,C1] + b1[C1]
                            Y1[batch, 28, 28, C1]
 
2 conv. layer             - W2[3, 3, C1, C2] + b2[C2]
2.1 max pooling filter 2x2, stride 2 - down sample the input (rescale input by 2) 28x28-> 14x14
                            Y2[batch, 14,14,C2] 
3 conv. layer             - W3[3, 3, C2, C3]  + b3[C3]
3.1 max pooling filter 2x2, stride 2 - down sample the input (rescale input by 2) 14x14-> 7x7
                            Y3[batch, 7, 7, C3] 
4 fully connecteed layer  - W4[7*7*C3, FC4]   + b4[FC4]
                            Y4[batch, FC4] 
5 output layer            - W5[FC4, 10]   + b5[10]
One-hot encoded labels      Y5[batch, 10]

As an optimizer I choose AdamOptimizer and all weights were randomly initialized from the Gaussian distribution with std=0.1. The activation function is RELU, without dropout.

Results

All results are for 5k iteration.

  • five-layers convolutional neural network with max pooling: accuracy=0.9890

Tensorflow MNIST train/test loss and accuracy for convolutional 5 layer network

Summary

  • Single layer neural network accuracy=0.9237
  • five layer fully-connected : accuracy=0.9541
  • five layer fully-connected with relu activation function and Adam optmizer: accuracy=0.9817
  • five layer fully-connected with relu activation, Adam optmizer and dropout: accuracy=0.9761
  • five layer convolutional neural network with max pooling : accuracy=0.9890

References and further reading

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].