All Projects → plkmo → Bible_text_gcn

plkmo / Bible_text_gcn

Pytorch implementation of "Graph Convolutional Networks for Text Classification"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bible text gcn

Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+27262.22%)
Mutual labels:  natural-language-processing, text-classification
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+777.78%)
Mutual labels:  natural-language-processing, text-classification
Pythoncode Tutorials
The Python Code Tutorials
Stars: ✭ 544 (+504.44%)
Mutual labels:  natural-language-processing, text-classification
Spacy Streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
Stars: ✭ 360 (+300%)
Mutual labels:  natural-language-processing, text-classification
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-33.33%)
Mutual labels:  natural-language-processing, text-classification
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+24320%)
Mutual labels:  natural-language-processing, text-classification
Wikipedia2vec
A tool for learning vector representations of words and entities from Wikipedia
Stars: ✭ 655 (+627.78%)
Mutual labels:  natural-language-processing, text-classification
seededlda
Semisupervided LDA for theory-driven text analysis
Stars: ✭ 46 (-48.89%)
Mutual labels:  text-classification, semi-supervised-learning
Scdv
Text classification with Sparse Composite Document Vectors.
Stars: ✭ 54 (-40%)
Mutual labels:  natural-language-processing, text-classification
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-57.78%)
Mutual labels:  natural-language-processing, text-classification
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+297.78%)
Mutual labels:  natural-language-processing, text-classification
Nlp Tutorial
A list of NLP(Natural Language Processing) tutorials
Stars: ✭ 1,188 (+1220%)
Mutual labels:  natural-language-processing, text-classification
Textfooler
A Model for Natural Language Attack on Text Classification and Inference
Stars: ✭ 298 (+231.11%)
Mutual labels:  natural-language-processing, text-classification
Awesome Semi Supervised Learning
📜 An up-to-date & curated list of awesome semi-supervised learning papers, methods & resources.
Stars: ✭ 538 (+497.78%)
Mutual labels:  natural-language-processing, semi-supervised-learning
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (-33.33%)
Mutual labels:  text-classification, semi-supervised-learning
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+6325.56%)
Mutual labels:  natural-language-processing, text-classification
Pytorch Transformers Classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Stars: ✭ 229 (+154.44%)
Mutual labels:  natural-language-processing, text-classification
Good Papers
I try my best to keep updated cutting-edge knowledge in Machine Learning/Deep Learning and Natural Language Processing. These are my notes on some good papers
Stars: ✭ 248 (+175.56%)
Mutual labels:  natural-language-processing, semi-supervised-learning
Easy Deep Learning With Allennlp
🔮Deep Learning for text made easy with AllenNLP
Stars: ✭ 32 (-64.44%)
Mutual labels:  natural-language-processing, text-classification
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+1157.78%)
Mutual labels:  natural-language-processing, text-classification

Graph Convolutional Network for Bible book classification

Overview

The text-based graph convolutional network (GCN) model is an interesting and novel state-of-the-art semi-supervised learning concept that is proposed recently, which is able to very accurately predict the labels of some unknown textual data given related known labeled textual data. It does so by embedding the entire corpus into a graph with documents and words as nodes, with each document-word & word-word edges having some predetermined weights based on their relationships with each other (eg. Tf-idf). A GCN is then trained on this graph with documents nodes that have known labels, and the trained GCN model is then used to infer the labels of unlabelled documents.

We implement text-GCN here using the Holy Bible as the corpus. The Holy Bible consists of 66 Books (Genesis, Exodus, etc) and 1189 Chapters. The goal here is to train a language model that is able to correctly classify the Book that some unlabelled Chapters belong to, given the labels of other Chapters. (Since we actually do know the exact labels of all Chapters, we intentionally mask the labels of some 10-20 % of the Chapters, which will be used as test set during model inference to measure the model accuracy) To do that, the language model needs to be able to distinguish between the contexts associated with the various Books (eg. Book of Genesis talks more about Adam & Eve while Book of Ecclesiastes talks about the life of King Solomon). The good results of the text-GCN model show that the graph structure is able to capture such context nicely, where the document (Chapter)-word edges encode the context within Chapters, while the word-word edges encode the relative context between Chapters.

Dataset

The Bible text data used here (BBE version) is obtained courtesy of https://github.com/scrollmapper/bible_databases.

Implementation

Implementation follows the paper on Text-based Graph Convolutional Network (https://arxiv.org/abs/1809.05679)

For more details on the scripts & implementation, see this article: https://towardsdatascience.com/text-based-graph-convolutional-network-for-semi-supervised-bible-book-classification-c71f6f61ff0f

Requirements

Requirements: Python (3.6+), networkx (2.1), torch (1.0.0), torchvision (0.2.1), standard Python libraries

Contents

You will find the following:

  1. generate_train_test_datasets.py – script containing functions to compute the edges weights, build and save the graph
  2. models.py – script containing the GCN model
  3. text_GCN.py – Main program to build the dataset and graph, construct the GCN and trains the model
  4. evaluate_results.py - evaluate the results and misclassified labels
  5. Data folder containing the Bible data (t_bbe.csv)

How to use

To start, clone the repo, then run text_GCN.py (-h for additional arguments)

Additional resources

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].