Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → chen0040 → Keras English Resume Parser And Analyzer

chen0040 / Keras English Resume Parser And Analyzer

Licence: mit

keras project that parses and analyze english resumes

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning keras convolutional-neural-networks recurrent-neural-networks nltk

Projects that are alternatives of or similar to Keras English Resume Parser And Analyzer

Hdltex

HDLTex: Hierarchical Deep Learning for Text Classification

Stars: ✭ 191 (-0.52%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Coursera Deep Learning Specialization

Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv) Convolutional Neural Networks; (v) Sequence Models

Stars: ✭ 188 (-2.08%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Top Deep Learning

Top 200 deep learning Github repositories sorted by the number of stars.

Stars: ✭ 1,365 (+610.94%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Deep Learning For Beginners

videos, lectures, blogs for Deep Learning

Stars: ✭ 89 (-53.65%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Practical Machine Learning With Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+872.92%)

Mutual labels: convolutional-neural-networks, nltk

Kerasr

R interface to the keras library

Stars: ✭ 90 (-53.12%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Deepecg

ECG classification programs based on ML/DL methods

Stars: ✭ 124 (-35.42%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Emnist

A project designed to explore CNN and the effectiveness of RCNN on classifying the EMNIST dataset.

Stars: ✭ 81 (-57.81%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Deep Learning With Pytorch Tutorials

深度学习与PyTorch入门实战视频教程配套源代码和PPT

Stars: ✭ 1,986 (+934.38%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Stars: ✭ 126 (-34.37%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Malware Classification

Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine for Malware Classification

Stars: ✭ 88 (-54.17%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Tfvos

Semi-Supervised Video Object Segmentation (VOS) with Tensorflow. Includes implementation of *MaskRNN: Instance Level Video Object Segmentation (NIPS 2017)* as part of the NIPS Paper Implementation Challenge.

Stars: ✭ 151 (-21.35%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Text classification

Text Classification Algorithms: A Survey

Stars: ✭ 1,276 (+564.58%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (-49.48%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Tnn

Biologically-realistic recurrent convolutional neural networks

Stars: ✭ 83 (-56.77%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Brainforge

A Neural Networking library based on NumPy only

Stars: ✭ 114 (-40.62%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

3d Reconstruction With Neural Networks

3D reconstruction with neural networks using Tensorflow. See link for Video (https://www.youtube.com/watch?v=iI6ZMST8Ri0)

Stars: ✭ 71 (-63.02%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Recurrent Environment Simulators

Deepmind Recurrent Environment Simulators paper implementation in tensorflow

Stars: ✭ 73 (-61.98%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Rcnn Text Classification

Tensorflow Implementation of "Recurrent Convolutional Neural Network for Text Classification" (AAAI 2015)

Stars: ✭ 127 (-33.85%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

Image Caption Generator

[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow

Stars: ✭ 141 (-26.56%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks

View All Similar Projects ➔

keras-english-resume-parser-and-analyzer

Deep learning project that parses and analyze english resumes.

The objective of this project is to use Keras and Deep Learning such as CNN and recurrent neural network to automate the task of parsing a english resume.

Overview

Parser Features

English NLP using NLTK
Extract english texts using pdfminer.six and python-docx from PDF nad DOCX
Rule-based resume parser has been implemented.

Deep Learning Features

Tkinter-based GUI tool to generate and annotate deep learning training data from pdf and docx files
Deep learning multi-class classification using recurrent and cnn networks for
- line type: classify each line of text extracted from pdf and docx file on whether it is a header, meta-data, or content
- line label classify each line of text extracted from pdf and docx file on whether it implies experience, education, etc.

The included deep learning models that classify each line in the resume files include:

cnn.py
- 1-D CNN with Word Embedding
- Multi-Channel CNN with categorical cross-entropy loss function
cnn_lstm.py
- 1-D CNN + LSTM with Word Embedding
lstm.py
- LSTM with category cross-entropy loss function
- Bi-directional LSTM/GRU with categorical cross-entropy loss function

Usage 1: Rule-based English Resume Parser

The sample code below shows how to scan all the resumes (in PDF and DOCX formats) from a [demo/data/resume_samples] folder and print out a summary from the resume parser if information extracted are available:

from keras_en_parser_and_analyzer.library.rule_based_parser import ResumeParser
from keras_en_parser_and_analyzer.library.utility.io_utils import read_pdf_and_docx


def main():
    data_dir_path = './data/resume_samples' # directory to scan for any pdf and docx files
    collected = read_pdf_and_docx(data_dir_path)
    for file_path, file_content in collected.items():

        print('parsing file: ', file_path)

        parser = ResumeParser()
        parser.parse(file_content)
        print(parser.raw) # print out the raw contents extracted from pdf or docx files

        if parser.unknown is False:
            print(parser.summary())

        print('++++++++++++++++++++++++++++++++++++++++++')

    print('count: ', len(collected))


if __name__ == '__main__':
    main()

IMPORTANT: the parser rules are implemented in the parser_rules.py. Each of these rules will be applied to every line of text in the resume file and return the target accordingly (or return None if not found in a line). As these rules are very naive implementation, you may want to customize them further based on the resumes that you are working with.

Usage 2: Deep Learning Resume Parser

Step 1: training data generation and annotation

A training data generation and annotation tool is created in the demo folder which allows resume deep learning training data to be generated from any pdf and docx files stored in the demo/data/resume_samples folder, To launch this tool, run the following command from the root directory of the project:

cd demo
python create_training_data.py

This will parse the pdf and docx files in demo/data/resume_samples folder and for each of these file launch a Tkinter-based GUI form to user to annotate individual text line in the pdf or docx file (clicking the "Type: ..." and "Label: ..." buttons multiple time to select the correct annotation for each line). On each form closing, the generated and annotated data will be saved to a text file in the demo/data/training_data folder. each line in the text file will have the following format

line_type   line_label  line_content

line_type and line_label has the following mapping to the actual class labels

line_labels = {0: 'experience', 1: 'knowledge', 2: 'education', 3: 'project', 4: 'others'}
line_types = {0: 'header', 1: 'meta', 2: 'content'}

Step 2: train the resume parser

After the training data is generated and annotated, one can train the resume parser by running the following command:

cd demo
python dl_based_parser_train.py

Below is the code for dl_based_parser_train.py:

import numpy as np
import os
import sys 


def main():
    random_state = 42
    np.random.seed(random_state)

    current_dir = os.path.dirname(__file__)
    current_dir = current_dir if current_dir is not '' else '.'
    output_dir_path = current_dir + '/models'
    training_data_dir_path = current_dir + '/data/training_data'
    
    # add keras_en_parser_and_analyzer module to the system path
    sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
    from keras_en_parser_and_analyzer.library.dl_based_parser import ResumeParser

    classifier = ResumeParser()
    batch_size = 64
    epochs = 20
    history = classifier.fit(training_data_dir_path=training_data_dir_path,
                             model_dir_path=output_dir_path,
                             batch_size=batch_size, epochs=epochs,
                             test_size=0.3,
                             random_state=random_state)


if __name__ == '__main__':
    main()

Upon completion of training, the trained models will be saved in the demo/models/line_label and demo/models/line_type folders

The default line label and line type classifier used in the deep learning ResumeParser is WordVecBidirectionalLstmSoftmax. But other classifiers can be used by adding the following line, for example:

from keras_en_parser_and_analyzer.library.dl_based_parser import ResumeParser
from keras_en_parser_and_analyzer.library.classifiers.cnn_lstm import WordVecCnnLstm

classifier = ResumeParser()
classifier.line_label_classifier = WordVecCnnLstm()
classifier.line_type_classifier = WordVecCnnLstm()
...

(Do make sure that the requirements.txt are satisfied in your python env)

Step 3: parse resumes using trained parser

After the trained models are saved in the demo/models folder, one can use the resume parser to parse the resumes in the demo/data/resume_samples by running the following command:

cd demo
python dl_based_parser_predict.py

Below is the code for dl_based_parser_predict.py:

import os
import sys 


def main():
    current_dir = os.path.dirname(__file__)
    current_dir = current_dir if current_dir is not '' else '.'
    sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
    
    from keras_en_parser_and_analyzer.library.dl_based_parser import ResumeParser
    from keras_en_parser_and_analyzer.library.utility.io_utils import read_pdf_and_docx
    
    data_dir_path = current_dir + '/data/resume_samples' # directory to scan for any pdf and docx files

    def parse_resume(file_path, file_content):
        print('parsing file: ', file_path)

        parser = ResumeParser()
        parser.load_model('./models')
        parser.parse(file_content)
        print(parser.raw)  # print out the raw contents extracted from pdf or docx files

        if parser.unknown is False:
            print(parser.summary())

        print('++++++++++++++++++++++++++++++++++++++++++')

    collected = read_pdf_and_docx(data_dir_path, command_logging=True, callback=lambda index, file_path, file_content: {
        parse_resume(file_path, file_content)
    })

    print('count: ', len(collected))


if __name__ == '__main__':
    main()

Configure to run on GPU on Windows

Step 1: Change tensorflow to tensorflow-gpu in requirements.txt and install tensorflow-gpu
Step 2: Download and install the CUDA® Toolkit 9.0 (Please note that currently CUDA® Toolkit 9.1 is not yet supported by tensorflow, therefore you should download CUDA® Toolkit 9.0)
Step 3: Download and unzip the cuDNN 7.4 for [email protected] Toolkit 9.0 and add the bin folder of the unzipped directory to the $PATH of your Windows environment

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 192

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (12) 🔗