All Projects → dmesquita → Understanding_tensorflow_nn

dmesquita / Understanding_tensorflow_nn

🔮Getting started with TensorFlow: Classifying Text with Neural Networks

Projects that are alternatives of or similar to Understanding tensorflow nn

Tutorials
All of the code for my Medium articles
Stars: ✭ 92 (-57.21%)
Mutual labels:  jupyter-notebook, tutorials
Lab Workshops
Materials for workshops on text mining, machine learning, and data visualization
Stars: ✭ 112 (-47.91%)
Mutual labels:  jupyter-notebook, tutorials
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (-55.35%)
Mutual labels:  jupyter-notebook, tutorials
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (-70.23%)
Mutual labels:  jupyter-notebook, tutorials
Rethinking Tensorflow Probability
Statistical Rethinking (2nd Ed) with Tensorflow Probability
Stars: ✭ 152 (-29.3%)
Mutual labels:  jupyter-notebook, tutorials
Fish detection
Fish detection using Open Images Dataset and Tensorflow Object Detection
Stars: ✭ 67 (-68.84%)
Mutual labels:  jupyter-notebook, tutorials
Firstcoursenetworkscience
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
Stars: ✭ 111 (-48.37%)
Mutual labels:  jupyter-notebook, tutorials
Opencv Tutorials
Tutorials for learning OpenCV in Python from Scratch
Stars: ✭ 36 (-83.26%)
Mutual labels:  jupyter-notebook, tutorials
Imageprocessing
MicaSense RedEdge and Altum image processing tutorials
Stars: ✭ 139 (-35.35%)
Mutual labels:  jupyter-notebook, tutorials
2018 19 Classes
https://cc-mnnit.github.io/2018-19-Classes/ - 🎒 💻 Material for Computer Club Classes
Stars: ✭ 119 (-44.65%)
Mutual labels:  jupyter-notebook, tutorials
Tutorials 2017
Geophysical Tutorials column for 2017
Stars: ✭ 63 (-70.7%)
Mutual labels:  jupyter-notebook, tutorials
100 Days Of Ml Code
A day to day plan for this challenge. Covers both theoritical and practical aspects
Stars: ✭ 172 (-20%)
Mutual labels:  jupyter-notebook, tutorials
Storytelling With Data
Course materials for Dartmouth Course: Storytelling with Data (PSYC 81.09).
Stars: ✭ 59 (-72.56%)
Mutual labels:  jupyter-notebook, tutorials
Lis Ynp
🔮 Life is short, you need PyTorch.
Stars: ✭ 92 (-57.21%)
Mutual labels:  jupyter-notebook, tutorials
Nlp Various Tutorials
자연어 처리와 관련한 여러 튜토리얼 저장소
Stars: ✭ 52 (-75.81%)
Mutual labels:  jupyter-notebook, tutorials
Awesome Pytorch List Cnversion
Awesome-pytorch-list 翻译工作进行中......
Stars: ✭ 1,361 (+533.02%)
Mutual labels:  jupyter-notebook, tutorials
Computer Vision Tutorial
Repository to help you get started with computer vision
Stars: ✭ 23 (-89.3%)
Mutual labels:  jupyter-notebook, tutorials
Hottbox Tutorials
📙 HOTTBOX: Higher Order Tensors ToolBOX. Tutorials
Stars: ✭ 29 (-86.51%)
Mutual labels:  jupyter-notebook, tutorials
Quantum Algorithms Tutorials
Tutorials for Quantum Algorithms with Qiskit implementations.
Stars: ✭ 115 (-46.51%)
Mutual labels:  jupyter-notebook, tutorials
Colab
Continual Learning tutorials and demo running on Google Colaboratory.
Stars: ✭ 168 (-21.86%)
Mutual labels:  jupyter-notebook, tutorials

Classifying Text with Neural Networks and TensorFlow

Hi! I was having a hard time learning how to use TensorFlow, so I got angry and I decided that I would learn how to use it and that I would also share the knowledge to help other people as well :)

You can check the blog post here (there are more details about Neural Networks there) or if you are in a hurry check just check the code below ;)


import pandas as pd
import numpy as np
import tensorflow as tf
from collections import Counter
from sklearn.datasets import fetch_20newsgroups

How TensorFlow works

import tensorflow as tf
my_graph = tf.Graph()
with tf.Session(graph=my_graph) as sess:
    x = tf.constant([1,3,6]) 
    y = tf.constant([1,1,1])
    op = tf.add(x,y)
    result = sess.run(fetches=op)
    print(result)
[2 4 7]

How to manipulate data and pass it to the Neural Network inputs

vocab = Counter()

text = "Hi from Brazil"

for word in text.split(' '):
    word_lowercase = word.lower()
    vocab[word_lowercase]+=1
        
def get_word_2_index(vocab):
    word2index = {}
    for i,word in enumerate(vocab):
        word2index[word] = i
        
    return word2index
word2index = get_word_2_index(vocab)

total_words = len(vocab)
matrix = np.zeros((total_words),dtype=float)

for word in text.split():
    matrix[word2index[word.lower()]] += 1
    
print("Hi from Brazil:", matrix)
Hi from Brazil: [ 1.  1.  1.]
matrix = np.zeros((total_words),dtype=float)
text = "Hi"
for word in text.split():
    matrix[word2index[word.lower()]] += 1
    
print("Hi:", matrix)
Hi: [ 1.  0.  0.]

Building the neural network

categories = ["comp.graphics","sci.space","rec.sport.baseball"]
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)
print('total texts in train:',len(newsgroups_train.data))
print('total texts in test:',len(newsgroups_test.data))
total texts in train: 1774
total texts in test: 1180
print('text',newsgroups_train.data[0])
print('category:',newsgroups_train.target[0])
text From: [email protected] (Kouhia Juhana)
Subject: Re: More gray levels out of the screen
Organization: Tampere University of Technology
Lines: 21
Distribution: inet
NNTP-Posting-Host: cc.tut.fi

In article <[email protected]> [email protected]
(Kenneth Sloan) writes:
>
>Why didn't you create 8 grey-level images, and display them for
>1,2,4,8,16,32,64,128... time slices?

By '8 grey level images' you mean 8 items of 1bit images?
It does work(!), but it doesn't work if you have more than 1bit
in your screen and if the screen intensity is non-linear.

With 2 bit per pixel; there could be 1*c_1 + 4*c_2 timing,
this gives 16 levels, but they are linear if screen intensity is
linear.
With 1*c_1 + 2*c_2 it works, but we have to find the best
compinations -- there's 10 levels, but 16 choises; best 10 must be
chosen. Different compinations for the same level, varies a bit, but
the levels keeps their order.

Readers should verify what I wrote... :-)

Juhana Kouhia

category: 0
vocab = Counter()

for text in newsgroups_train.data:
    for word in text.split(' '):
        vocab[word.lower()]+=1
        
for text in newsgroups_test.data:
    for word in text.split(' '):
        vocab[word.lower()]+=1
print("Total words:",len(vocab))
Total words: 119930
total_words = len(vocab)

def get_word_2_index(vocab):
    word2index = {}
    for i,word in enumerate(vocab):
        word2index[word.lower()] = i
        
    return word2index

word2index = get_word_2_index(vocab)

print("Index of the word 'the':",word2index['the'])
Index of the word 'the': 10
def text_to_vector(text):
    layer = np.zeros(total_words,dtype=float)
    for word in text.split(' '):
        layer[word2index[word.lower()]] += 1
        
    return layer

def category_to_vector(category):
    y = np.zeros((3),dtype=float)
    if category == 0:
        y[0] = 1.
    elif category == 1:
        y[1] = 1.
    else:
        y[2] = 1.
        
    return y
def get_batch(df,i,batch_size):
    batches = []
    results = []
    texts = df.data[i*batch_size:i*batch_size+batch_size]
    categories = df.target[i*batch_size:i*batch_size+batch_size]
    
    for text in texts:
        layer = text_to_vector(text) 
        batches.append(layer)
        
    for category in categories:
        y = category_to_vector(category)
        results.append(y)  
     
    return np.array(batches),np.array(results)
print("Each batch has 100 texts and each matrix has 119930 elements (words):",get_batch(newsgroups_train,1,100)[0].shape)
Each batch has 100 texts and each matrix has 119930 elements (words): (100, 119930)
print("Each batch has 100 labels and each matrix has 3 elements (3 categories):",get_batch(newsgroups_train,1,100)[1].shape)
Each batch has 100 labels and each matrix has 3 elements (3 categories): (100, 3)
# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 150
display_step = 1

# Network Parameters
n_hidden_1 = 100      # 1st layer number of features
n_hidden_2 = 100       # 2nd layer number of features
n_input = total_words # Words in vocab
n_classes = 3         # Categories: graphics, sci.space and baseball

input_tensor = tf.placeholder(tf.float32,[None, n_input],name="input")
output_tensor = tf.placeholder(tf.float32,[None, n_classes],name="output") 
def multilayer_perceptron(input_tensor, weights, biases):
    layer_1_multiplication = tf.matmul(input_tensor, weights['h1'])
    layer_1_addition = tf.add(layer_1_multiplication, biases['b1'])
    layer_1 = tf.nn.relu(layer_1_addition)
    
    # Hidden layer with RELU activation
    layer_2_multiplication = tf.matmul(layer_1, weights['h2'])
    layer_2_addition = tf.add(layer_2_multiplication, biases['b2'])
    layer_2 = tf.nn.relu(layer_2_addition)
    
    # Output layer 
    out_layer_multiplication = tf.matmul(layer_2, weights['out'])
    out_layer_addition = out_layer_multiplication + biases['out']
    
    return out_layer_addition
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
prediction = multilayer_perceptron(input_tensor, weights, biases)

# Define loss and optimizer
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=output_tensor))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

# Initializing the variables
init = tf.global_variables_initializer()

# [NEW] Add ops to save and restore all the variables
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(len(newsgroups_train.data)/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x,batch_y = get_batch(newsgroups_train,i,batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            c,_ = sess.run([loss,optimizer], feed_dict={input_tensor: batch_x,output_tensor:batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "loss=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(output_tensor, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    total_test_data = len(newsgroups_test.target)
    batch_x_test,batch_y_test = get_batch(newsgroups_test,0,total_test_data)
    print("Accuracy:", accuracy.eval({input_tensor: batch_x_test, output_tensor: batch_y_test}))
    
    # [NEW] Save the variables to disk
    save_path = saver.save(sess, "/tmp/model.ckpt")
    print("Model saved in path: %s" % save_path)
Epoch: 0001 loss= 1289.632795854
Epoch: 0002 loss= 270.930032210
Epoch: 0003 loss= 220.176122665
Epoch: 0004 loss= 119.421511043
Epoch: 0005 loss= 47.321779143
Epoch: 0006 loss= 32.028330424
Epoch: 0007 loss= 11.666836267
Epoch: 0008 loss= 15.337957333
Epoch: 0009 loss= 6.931441394
Epoch: 0010 loss= 0.000000000
Optimization Finished!
Accuracy: 0.749153
Model saved in path: /tmp/model.ckpt

Making a prediction

1 text

# Get a text to make a prediction
text_for_prediction = newsgroups_test.data[5]
print('text',text_for_prediction)
text From: [email protected] (John S. Neff)
Subject: Re: Space spinn offs
Nntp-Posting-Host: pluto.physics.uiowa.edu
Organization: The University of Iowa
Lines: 23

In article <[email protected]> [email protected] (William F Brown) writes:
>From: [email protected] (William F Brown)
>Subject: Re: Space spinn offs
>Date: 30 Apr 1993 19:27:24 GMT
>I just wanted to point out, that Teflon wasn't from the space program.
>It was from the WWII nuclear weapons development program.  Pipes in the 
>system for fractioning and enriching uranium had to be lined with it.
>
>Uranium Hexafloride was the chemical they turned the pitchblend into for
>enrichment.  It is massively corrosive.  Even to Stainless steels. Hence
>the need for a very inert substaance to line the pipes with.  Teflon has
>all its molecular sockets bound up already, so it is very unreactive.
>
>My 2 sense worth.
>
>Bill
>

The artifical pacemaker was invented in 1958 by Wilson Greatbatch an
American biomedical engineer. The bill authorizing NASA was signed
in October of 1958 so it is clear that NASA had nothing to do with
the invention of the pacemaker.
print("Text correct category:", newsgroups_test.target[5])
Text correct category: 2
# Convert text to vector so we can send it to our model
vector_txt = text_to_vector(text)
# Wrap vector like we do in get_batches()
input_array = np.array([vector_txt])
input_array.shape
(1, 119930)
saver = tf.train.Saver()

with tf.Session() as sess:
    saver.restore(sess, "/tmp/model.ckpt")
    print("Model restored.")
    
    classification = sess.run(tf.argmax(prediction, 1), feed_dict={input_tensor: input_array})
    print("Predicted category:", classification)
Model restored.
Predicted category: [0]

10 texts

# Get 10 texts to make a prediction

x_10_texts,y_10_correct_labels = get_batch(newsgroups_test,0,10)

saver = tf.train.Saver()

with tf.Session() as sess:
    saver.restore(sess, "/tmp/model.ckpt")
    print("Model restored.")
    
    classification = sess.run(tf.argmax(prediction, 1), feed_dict={input_tensor: x_10_texts})
    print("Predicted categories:", classification)
Model restored.
Predicted categories: [1 0 2 0 1 2 1 0 1 2]
print("Correct categories:", np.argmax(y_10_correct_labels, 1))
Correct categories: [1 0 1 0 1 2 2 0 1 2]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].