All Projects → datacamp → Datacamp_facebook_live_nlp

datacamp / Datacamp_facebook_live_nlp

Licence: mit
DataCamp Facebook Live Code Along Session 1: Enjoy.

Projects that are alternatives of or similar to Datacamp facebook live nlp

Jupyterlab Demo
Demonstrations of JupyterLab
Stars: ✭ 122 (-2.4%)
Mutual labels:  jupyter-notebook
Deepinsight
A general framework for interpreting wide-band neural activity
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Nlp Beginner Guide Keras
NLP model implementations with keras for beginner
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Gdeltpyr
Python based framework to retreive Global Database of Events, Language, and Tone (GDELT) version 1.0 and version 2.0 data.
Stars: ✭ 124 (-0.8%)
Mutual labels:  jupyter-notebook
100 Days Of Nlp
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Onlineminingtripletloss
PyTorch conversion of https://omoindrot.github.io/triplet-loss
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Predictive Maintenance
Data Wrangling, EDA, Feature Engineering, Model Selection, Regression, Binary and Multi-class Classification (Python, scikit-learn)
Stars: ✭ 124 (-0.8%)
Mutual labels:  jupyter-notebook
Seqface
SeqFace : Making full use of sequence information for face recognition
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Huggingtweets
Tweet Generation with Huggingface
Stars: ✭ 124 (-0.8%)
Mutual labels:  jupyter-notebook
Lit2vec
Representing Books as vectors using the Word2Vec algorithm
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Nb2mail
Send a notebook as an email
Stars: ✭ 124 (-0.8%)
Mutual labels:  jupyter-notebook
Dash Sample Apps
Open-source demos hosted on Dash Gallery
Stars: ✭ 2,090 (+1572%)
Mutual labels:  jupyter-notebook
Pytorch challenge
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Error Detection
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Stars: ✭ 124 (-0.8%)
Mutual labels:  jupyter-notebook
Keras Mdn Layer
An MDN Layer for Keras using TensorFlow's distributions module
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Carnd Lenet Lab
Implement the LeNet deep neural network model with TensorFlow.
Stars: ✭ 124 (-0.8%)
Mutual labels:  jupyter-notebook
Pandaset Devkit
Stars: ✭ 121 (-3.2%)
Mutual labels:  jupyter-notebook
Understanding Pytorch Batching Lstm
Understanding and visualizing PyTorch Batching with LSTM
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Choicenet
Implementation of ChoiceNet
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook
Dnnweaver2
Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.
Stars: ✭ 125 (+0%)
Mutual labels:  jupyter-notebook

Frequencies of words in novels: a Data Science pipeline

with DataCamp's very own Hugo Bowne-Anderson. Follow him on twitter @hugobowne

Description

In this live code-along session, you'll learn how to build a Data Science pipeline to plot frequency distributions of words in Moby Dick, among many other novels. We won't give you the novels: you'll learn to scrape them from the website Project Gutenberg (large corpus of books) using the Python package requests and how to extract the novels from this web data using BeautifulSoup. Then you'll dive in to analyzing the novels using the Natural Language ToolKit (nltk). In the process you'll learn about important aspects of Natural Language Processing (NLP) such as tokenization and stopwords. You'll come out being able to visualize word frequency distributions of any novel that you can find on Project Gutenberg. The NLP skills you develop, however, will be applicable to much of the data that Data Scientists encounter as the vast proportion of the world's data is unstructured data and includes a great deal of text.

For example, what would the following word frequency distribution be from?

Prerequisites

Not a lot. It would help if you knew

  • programming fundamentals and the basics of the Python programming language (e.g., variables, for loops);
  • a bit about Jupyter Notebooks;
  • your way around the terminal/shell.

However, I have always found that the most important and beneficial prerequisite is a will to learn new things so if you have this quality, you'll definitely get something out of this code-along session.

Also, if you'd like to watch and not code along, you'll also have a great time and these notebooks will be downloadable afterwards also.

If you are going to code along and use the Anaconda distribution of Python 3 (see below), I ask that you install it before the session.

Getting set up computationally

1. Clone the repository

To get set up for this live coding session, clone this repository. You can do so by executing the following in your terminal:

git clone https://github.com/datacamp/datacamp_facebook_live_nlp

Alternatively, you can download the zip file of the repository at the top of the main page of the repository. If you prefer not to use git or don't have experience with it, this a good option.

2. Download Anaconda (if you haven't already)

If you do not already have the Anaconda distribution of Python 3, go get it (n.b., you can also do this w/out Anaconda using pip to install the required packages, however Anaconda is great for Data Science and I encourage you to use it).

3. Create your conda environment for this session

Navigate to the relevant directory datacamp_facebook_live_nlp and install required packages in a new conda environment:

conda env create -f environment.yml

This will create a new environment called fb_live_nlp. To activate the environment on OSX/Linux, execute

source activate fb_live_nlp

On Windows, execute

activate fb_live_nlp

4. Open your Jupyter notebook

In the terminal, execute jupyter notebook.

Then open the notebook NLP_FB_live_coding.ipynb and we're ready to get coding. Enjoy.

Code

The code in this repository is released under the MIT license. Read more at the Open Source Initiative. All text remains the Intellectual Property of DataCamp. If you wish to reuse, adapt or remix, get in touch with me at hugo at datacamp com to request permission.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].