All Projects → rcdilorenzo → ecce

rcdilorenzo / ecce

Licence: GPL-3.0 license
ML Prediction of Bible Topics and Passages (Python / React)

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
shell
77523 projects

Projects that are alternatives of or similar to ecce

loon
A Toolkit for Interactive Statistical Data Visualization
Stars: ✭ 45 (+25%)
Mutual labels:  interactive-visualizations
fastapi
基于Fastapi开发,集成Celery-redis分布式任务队列、JWT 用户系统、ElasticSearch和encode orm的基础项目模板,大家可以根据自己的需求在本模板上进行修改
Stars: ✭ 75 (+108.33%)
Mutual labels:  fastapi
numpy-cnn
A numpy based CNN implementation for classifying images
Stars: ✭ 47 (+30.56%)
Mutual labels:  fully-connected-network
plexus
Plexus - Interactive Emotion Visualization based on Social Media
Stars: ✭ 27 (-25%)
Mutual labels:  interactive-visualizations
KivyMLApp
The repository host the API for the ML model via FastAPI, Flask and contains android app development files using KivyMD.
Stars: ✭ 54 (+50%)
Mutual labels:  fastapi
chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
Stars: ✭ 210 (+483.33%)
Mutual labels:  fastapi
image-background-remove-tool
✂️ Automated high-quality background removal framework for an image using neural networks. ✂️
Stars: ✭ 767 (+2030.56%)
Mutual labels:  fastapi
Python-Studies
All studies about python
Stars: ✭ 56 (+55.56%)
Mutual labels:  fastapi
potato-disease-classification
Potato Disease Classification - Training, Rest APIs, and Frontend to test.
Stars: ✭ 95 (+163.89%)
Mutual labels:  keras-tensorflow
Voice4Rural
A complete one stop solution for all the problems of Rural area people. 👩🏻‍🌾
Stars: ✭ 12 (-66.67%)
Mutual labels:  fastapi
ml gallery
This is a master project of some experiments with Neural Networks. Every project here is runnable, visualized and explained clearly.
Stars: ✭ 18 (-50%)
Mutual labels:  fastapi
fastapi-mysql-generator
FastAPI + MySQL Web项目生成器 ,个人认为较为合理的项目组织结构;基于apscheduler的定时任务。
Stars: ✭ 348 (+866.67%)
Mutual labels:  fastapi
fastapi-async-mongodb
Simple example with FastAPI + MongoDB
Stars: ✭ 49 (+36.11%)
Mutual labels:  fastapi
cnn-visualization-keras-tf2
Filter visualization, Feature map visualization, Guided Backprop, GradCAM, Guided-GradCAM, Deep Dream
Stars: ✭ 21 (-41.67%)
Mutual labels:  keras-tensorflow
openapi-python-client
Generate modern Python clients from OpenAPI
Stars: ✭ 543 (+1408.33%)
Mutual labels:  fastapi
Machine-Learning-Notebooks
15+ Machine/Deep Learning Projects in Ipython Notebooks
Stars: ✭ 66 (+83.33%)
Mutual labels:  keras-tensorflow
Keras catVSdog tf estimator
Source for post "An Easy Guide to build new TensorFlow Datasets and Estimator with Keras Model"
Stars: ✭ 32 (-11.11%)
Mutual labels:  keras-tensorflow
ECG-acquisition-classification
Single Lead ECG signal Acquisition and Arrhythmia Classification using Deep Learning
Stars: ✭ 41 (+13.89%)
Mutual labels:  keras-tensorflow
Recurrent-Neural-Network-for-BitCoin-price-prediction
Recurrent Neural Network (LSTM) by using TensorFlow and Keras in Python for BitCoin price prediction
Stars: ✭ 53 (+47.22%)
Mutual labels:  keras-tensorflow
guane-intern-fastapi
FastAPI-PostgreSQL-Celery-RabbitMQ-Redis bakcend with Docker containerization
Stars: ✭ 54 (+50%)
Mutual labels:  fastapi

Exploratory Core Concept Extraction (Ecce)

GPLv3 last commit

Screenshot

Introduction

ecce = "behold" (Latin)

Deuteronomy 5:24 (ESV)

And you said, ‘Behold, the Lord our God has shown us his glory and greatness, and owe have heard his voice out of the midst of the fire. This day we have seen God speak with man, and man still live.

For thousands of years, people have studied the Bible from countless perspectives with diverse approaches towards various goals. As a Christian myself, I have read, discussed, and learned from it both in personal study and through others. With the plethora of related documents in the form of commentaries, topical indexes, dictionaries, and cross-references, the Bible has been scoured from cover to cover throughout the ages.

The application of this project is two-fold. The first objective is to create a visual exploration of the topics from the Bible. If time permits, this would be accomplished using an interactive website that gives users a way to see related passages that were only previously linked in a manual fashion. Second, the trained network will be used to predicting both related topics and Scripture references from arbitrary text (similar in form to Bible verses).

Overview

This project is the intersection and analysis of three data sources: English Standard Version (ESV Bible translation), Nave's Topical Index, and Treasury of Scripture Knowledge (TSK, cross-references). The actual data processing and entire flow of the project can be found in the rendered notebook. Additional interactive exploratory data analysis can be found in several React components from the web app. The primary interaction in the web app flows through two models. The topic model combines ESV verse text with a filtered list of Nave's topics (at least 30 verses per topic). The cluster model combines ESV verse text with the cross-references from TSK such that groups of passages can be predicted.

Image

In addition, I presented this project in my final semester for an M.S. in Data Science. The slides I used to present can be found in the repository:

Slides

Data Sources

English Standard Version. Text from English Standard Version (2001) is employed using JSON from honza/bibles. All copyrights remain with Crossway.1 Passages longer than three verses are truncated in the interface and link directly to BibleGateway.

Nave's Topical Index. Topics were extracted from text files assembled by the folks behind JustVerses.com from the original, public domain PDF. Although three levels of data are available (topics, categories, and sub-topics), the primary focus was the top-level topics with a total of ~4,200 topics that intersected with verses available from the ESV.

Treasury of Scripture Knowledge. Cross-references were also extracted from text files downloaded from JustVerses.com from the original, public domain data. These verses were associated with the ESV text by validating the references from just over 63,500 cross-reference clusters.

Topic Model

nave-diagram

Cluster (Passage) Model

tsk-diagram

Results

Both of the highest performing models ended up being extremely large fully-connected neural networks although multiple types of recurrent architectures were explored (LSTMs and GRUs) with word embeddings from GloVe. The topic model came in at 435MB with 36,315,622 parameters with an input size of 13,337 and an output of 853 topics. The cluster model was 2.3GB with 191,259,581 parameters with an input size of 150 (truncated SVD of encoded word vocabulary) and an output of 63,581 clusters of cross-references.

Topic Model

Using data from Nave's Topical Index (about ~4,200 without filtering), all of the following model revisions were trained on 21,106 verses, validated on 3,725 verses, and evaluated on 6,208 verses.

Name Categorical Accuracy Notes
lstm-base 2.95% sequence of words, no word embeddings, ~4200 possible topics
lstm-b4cab4 5.72% tuned and tweaked, reduce to ~850 topics, word embeddings from glove.42B.300d (includes 92.55% of ESV words)
svd-bow-cb8915 6.91% switch to truncated SVD with bag-of-words
svd-bow-52a075 6.62% additional experiments, exclude top two topics
svd-bow-88bf90 8.21% make SVD 200 components (102% of last model size)
svd-bow-ced288 7.06% make SVD 150 components (200 was too big for initial production machine)
nave-4576e8 13.61% properly filter topics and remove SVD due to smaller model size, use vocabulary count vectorizer as direct input

Cluster Model

The cluster model was trained on cross-references from the Treasury of Scripture Knowledge . All of the following model revisions were trained on 20,837 verses, validated on 2,678 verses, and evaluated on 6,129 verses (70%-10%-20% split).

Name Categorical Accuracy Notes
tsk-cluster-87b509 0.25% initial fully-connected model
tsk-cluster-f13345 0.33% add dropout layers and tweak architecture
tsk-cluster-1d7203 1.05% fix verses to have multiple uuids
tsk-cluster-26869f 1.14% add hidden layer and overfit with 10 patience epochs
tsk-cluster-4e1698 1.16% make SVD 200 components (doubled model size)
tsk-cluster-47f717 1.24% make SVD 150 components (200 was too big for production)
tsk-cluster-8a1db9 1.32% change epoch patience to 2 instead of 3

Usage

If you're interested in running the project or extending the existing work, you'll need to do the following setup the first time.

# Download sources
./download.sh

# Install Python version and setup dependencies
pyenv install 3.6.8
pyenv virtualenv 3.6.8 $(cat .python-version)
pip install -r requirements.txt

# Download spaCy model
python -m spacy download en

With this setup complete, some of the primary ways you'd want to interact the code are provided by the command line utility that includes documentation for each command.

❯ python -m ecce -h
usage: __main__.py [-h]
                   {nave-export,topic-export,train-nave,train-tsk,predict-nave,predict-tsk}
                   ...

positional arguments:
  {nave-export,topic-export,train-nave,train-tsk,predict-nave,predict-tsk}
    nave-export         Export processed data from Nave's Topical Index
    topic-export        Preprocess topics and export with ESV text
    train-nave          Train an neural network model on Nave data
    train-tsk           Train cluster model on TSK data
    predict-nave        (REPL) Predict topics based on text
    predict-tsk         (REPL) Predict TSK clusters based on text

optional arguments:
  -h, --help            show this help message and exit

Additional Information

Ecce: ML Prediction of Bible Topics and Passages

Copyright (C) 2019 Christian Di Lorenzo

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.



1 If you believe that the use of ESV text is in violation of copyrights, please send me a direct message with your reasoning so that I can remain above board. My current understanding is that using the 2001 version is not prohibitive in the manner I am using it assuming the entire application is open, noncommercial, and not exposing entire books of the Bible.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].