blueprints-for-text-analytics-python / blueprints-text

Licence: Apache-2.0 License
Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"

Programming Languages

Jupyter Notebook
11667 projects
HTML
75241 projects
TeX
3793 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to blueprints-text

thrones2vec
Using Word2Vec to explore semantic similarities between the entities of "A Song of Ice and Fire" ("Game of Thrones").
Stars: ✭ 27 (-73.79%)
Mutual labels:  text-mining
Introduction-to-text-mining-with-Python
Lectures in Urban Data Science Lab, Seoul
Stars: ✭ 25 (-75.73%)
Mutual labels:  text-mining
SparseLSH
A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Stars: ✭ 127 (+23.3%)
Mutual labels:  text-mining
R.TeMiS
R.TeMiS: R Text Mining Solution
Stars: ✭ 21 (-79.61%)
Mutual labels:  text-mining
civicmine
Text mining cancer biomarkers for the CIVIC database
Stars: ✭ 19 (-81.55%)
Mutual labels:  text-mining
TextDatasetCleaner
🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-73.79%)
Mutual labels:  text-mining
misinfo
📊 Tools to Perform ‘Misinformation’ Analysis on a Text Corpus (wrapper for methods in https://github.com/PDXBek/Misinformation)
Stars: ✭ 17 (-83.5%)
Mutual labels:  text-mining
textdigester
TextDigester: document summarization java library
Stars: ✭ 23 (-77.67%)
Mutual labels:  text-mining
text-mining-corona-articles
Text Mining for Indonesian Online News Articles About Corona
Stars: ✭ 15 (-85.44%)
Mutual labels:  text-mining
sacred
📖 Sacred texts in R
Stars: ✭ 19 (-81.55%)
Mutual labels:  text-mining
Adjutant
Runs a pubmed query, returns results and allows user to explore high-level structure of returned documents
Stars: ✭ 59 (-42.72%)
Mutual labels:  text-mining
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-73.79%)
Mutual labels:  text-mining
restaurant-finder-featureReviews
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-79.61%)
Mutual labels:  text-mining
TRUNAJOD2.0
An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (-82.52%)
Mutual labels:  text-mining
gofastr
Make a DocumentTermMatrix faster
Stars: ✭ 19 (-81.55%)
Mutual labels:  text-mining
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+42.72%)
Mutual labels:  text-mining
Quran-and-Arabic-Language-Repository
Projects & Libraries related to Quran & Arabic Language
Stars: ✭ 26 (-74.76%)
Mutual labels:  text-mining
advanced-text-mining
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Stars: ✭ 15 (-85.44%)
Mutual labels:  text-mining
ipo-miner
IPO Investment via Text Mining.
Stars: ✭ 20 (-80.58%)
Mutual labels:  text-mining
Guten-gutter
Strips boilerplate from Project Gutenberg text files
Stars: ✭ 16 (-84.47%)
Mutual labels:  text-mining

Blueprints for Text Analytics Using Python

Machine Learning Based Solutions for Common Real World (NLP) Applications

Jens Albrecht, Sidharth Ramachandran, Christian Winkler

Published by O'Reilly, 2020

cover

Find the book at
O'Reilly
Amazon.com
Amazon.de
Amazon.co.uk
Amazon.fr
Amazon.in

If you like the book or the code examples here, please leave a friendly comment on Amazon!

Download your free chapter now!

Free download of Chapter 7 "How to Explain a Classifier".


Content of this Repository

This repository is currently in preparation. Please do not yet send any comments.

This repository contains the code examples of our O'Reilly book. You will find a subdirectory for each chapter containing a Jupyter notebook and additional files for the setup.

Below you find the links to view the notebooks here on Github or execute them directly on Google Colab. In the section thereafter you will find instructions to setup the environment on your local computer.

Problems and errors

If you discover any problems or have recommendations on how to improve the code, do not hesitate to create an issue here in the repository.

For errors in the book text, please use O'Reilly's errata page.

spaCy 3.0 and Gensim 4.0

The book uses spaCy 2.3.2 and gensim 3.8.3. spaCy 3.0 is now officially release with several new features and a few API changes (https://spacy.io/usage/v3). Gensim 4.0 is in beta (https://github.com/RaRe-Technologies/gensim/releases).

We are already updating our notebooks. But currently textacy is not yet supporting spaCy 3.0, although work is already in progress (see this pull request from us). Until textacy for spaCy 3.0 is released, you can use our own fork for the installation (see blueprints.yaml in this directory).

View or Run the Notebooks

For each chapter of the book we provide three links:

  • "git" opens the notebook for viewing here on Github (sometimes not working because of Github issue)
  • "nbviewer" opens the notebook for viewing on nbviewer.ipython.org
  • "colab" opens a runnable copy on Google's Colab service

If you run the notebook locally or on Colab, you can execute each cell separately by hitting "Shift-enter". Do not skip cells and don't forget to run the first code cells for the setup.

Local Setup

The following instructions should work on Linux, Windows and MacOS. If you are a Windows user familiar with Linux, you should check out the Windows Subsystem for Linux, Version 2 (WSL2). This allows to use a Linux system on the Windows machine. However, using native Windows should also be no problem.

It is helpful to install git on your machine, but you can also download the full repository from Github as a zip file. If you use git, run the following commands from the command line:

git clone https://github.com/blueprints-for-text-analytics-python/blueprints-text.git
cd blueprints-text

Otherwise download the zip file, unpack it to a location convenient to you, and open a command line terminal in the project directory blueprints-text.

For local setup, we recommend to use Miniconda, a minimal version of the popular Anaconda distribution that contains only the package manager conda and Python. Follow the installation instructions on the Miniconda Homepage. If you already have Anaconda or Miniconda installed on your system: That's fine. We will create a separate virtual environment for the blueprints book so that our installation will not interfere with your previous setup.

After installation of Anaconda/Miniconda, run the following command(s) from the project directory:

conda env create --name blueprints --file blueprints.yml
conda activate blueprints

The prompt should change after activation and indicate that you are working in the blueprints environment. Our installation includes the Jupyter notebook extensions. We suggest to enable the extensions "table of contents" (toc2), "execute time", and "variable inspector" (varInspector):

jupyter nbextension enable toc2/main
jupyter nbextension enable execute_time/ExecuteTime
jupyter nbextension enable varInspector/main

Now you can start the Jupyter notebook server:

jupyter notebook

If working on WSL under Windows, add --no-browser.

Browse to the respective chapter and open the notebook file (suffix .ipynb)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].