All Projects → srbhr → Naive-Resume-Matching

srbhr / Naive-Resume-Matching

Licence: Apache-2.0 license
Text Similarity Applied to resume, to compare Resumes with Job Descriptions and create a score to rank them. Similar to an ATS.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Naive-Resume-Matching

bns-short-text-similarity
📖 Use Bi-normal Separation to find document vectors which is used to compute similarity for shorter sentences.
Stars: ✭ 24 (-11.11%)
Mutual labels:  text-classification, text-similarity, cosine-similarity
Fastrtext
R wrapper for fastText
Stars: ✭ 103 (+281.48%)
Mutual labels:  text-classification, word-embeddings
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (+122.22%)
Mutual labels:  text-classification, word-embeddings
Fasttext.js
FastText for Node.js
Stars: ✭ 127 (+370.37%)
Mutual labels:  text-classification, word-embeddings
Lbl2Vec
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Stars: ✭ 25 (-7.41%)
Mutual labels:  text-classification, word-embeddings
Meta
A Modern C++ Data Sciences Toolkit
Stars: ✭ 600 (+2122.22%)
Mutual labels:  text-classification, word-embeddings
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+7722.22%)
Mutual labels:  text-classification, text-similarity
Streamlit
Streamlit — The fastest way to build data apps in Python
Stars: ✭ 16,906 (+62514.81%)
Mutual labels:  data-analysis, streamlit
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+625.93%)
Mutual labels:  text-classification, word-embeddings
Hacknical
中文版说明
Stars: ✭ 1,452 (+5277.78%)
Mutual labels:  resume, data-analysis
overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification
NLP tutorial
Stars: ✭ 41 (+51.85%)
Mutual labels:  text-classification, word-embeddings
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (+22.22%)
Mutual labels:  text-classification, data-analysis
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (+214.81%)
Mutual labels:  text-classification, text-similarity
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-14.81%)
Mutual labels:  text-classification, word-embeddings
textgo
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
Stars: ✭ 33 (+22.22%)
Mutual labels:  text-classification, text-similarity
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (+300%)
Mutual labels:  text-classification, word-embeddings
Jfasttext
Java interface for fastText
Stars: ✭ 193 (+614.81%)
Mutual labels:  text-classification, word-embeddings
text analysis tools
中文文本分析工具包(包括- 文本分类 - 文本聚类 - 文本相似性 - 关键词抽取 - 关键短语抽取 - 情感分析 - 文本纠错 - 文本摘要 - 主题关键词-同义词、近义词-事件三元组抽取)
Stars: ✭ 410 (+1418.52%)
Mutual labels:  text-classification, text-similarity
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (+11.11%)
Mutual labels:  text-classification, data-analysis
vinum
Vinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.
Stars: ✭ 57 (+111.11%)
Mutual labels:  data-analysis

Naive-Resume-Matcher

A Machine Learning Based Resume Matcher, to compare Resumes with Job Descriptions. Create a score based on how good/similar a resume is to the particular Job Description.\n Documents are sorted based on Their TF-IDF Scores (Term Frequency-Inverse Document Frequency)

Check the live version here. The instance might sleep if not used in a long time, so in such cases drop me a mail or fork this repo and launch your own instance at Streamlit's Cloud Instance

Matching Algorihms used are :-

  • String Matching

    • Monge Elkan
  • Token Based

    • Jaccard
    • Cosine
    • Sorensen-Dice
    • Overlap Coefficient

Topic Modelling of Resumes is done to provide additional information about the resumes and what clusters/topics, the belong to. For this :-

  1. TF-IDF of resumes is done to improve the sentence similarities. As it helps reduce the redundant terms and brings out the important ones.
  2. id2word, and doc2word algorithms are used on the Documents (from Gensim Library).
  3. LDA (Latent Dirichlet Allocation) is done to extract the Topics from the Document set.(In this case Resumes)
  4. Additional Plots are done to gain more insights about the document.

Images

  1. List of Job Descriptions to Choose from. List of Job Descriptions to choose from

  2. Preview of your Chosen Job Description The Job Description

  3. Your Resumes are ranked now! Check the top Ones!! Ranked Resumes as per Job Description

  4. Score distribution of different candidates incase you want to check some more. Score distribution of different candidates

  5. Topic Disctribution of Various Resumes Topic Disctribution of Various Resumes

  6. Topic Distribution Sunburst Chart Topic Distribution Sunburst Chart

  7. Word Cloud of your resume for a quick glance! Word Cloud of your resume for a quick glance!

Preview

Working Video

Progress Flow

  1. Input is Resumes and Job Description, the current code is capable to compare resumes to multiple job descriptions.
  2. Job Description and Resumes are parsed with the help of Tesseract Library in python, and then is converted into two CSV files.Namely Resume_Data.csvandJob_Data.csv.
  3. While doing the reading, the python script named fileReader.py reads, and cleans the code and does the TF-IDF based filtering as well. (This might take sometime to process, so please be patient while executing the script.)
  4. For any further comparisons the prepared CSV files are used.
  5. app.py containg the code for running the streamlit server and allowing to perform tasks. Use streamlit run app.py to execute the script.

File Structure

Data > Resumes and > JobDescription

The Data folder contains two folders that are used to read and provide data from. Incase of allowing the option to upload documents, Data\Resumes and Data\JobDesc should be the target for Resumes and Job Description respectively.

Due the flexibility of Textract we need not to provide the type of document it needs to scan, it does so automatically.

But for the Job Description it needs to be in Docx format, it can be changed as well.

Installation Instructions

A python virtual environment is required for this. Please read this page for more information.

A pip requirements.txt file is provided. It is advised to install the packages listed below, manually by doing pip install <package_name>. As the requirements.txt file may have some unecessary additional dependencies.

Popular Packages used are:-

Furthermore the packages like NLTK and Spacy requires additional data to be downloaded. After installing them please perform:-

## For Spacy's English Package
python -m spacy download en_core_web_sm

## For NLTK Data
import nltk
nltk.download('popular')  # this downloads the popular packages from NLTK_DATA

Execution Instructions

Please check the How To file for execution instructions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].