All Projects → bank-of-england → occupationcoder

bank-of-england / occupationcoder

Licence: other
Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to occupationcoder

Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (+456.67%)
Mutual labels:  text-analysis, tf-idf
HurdleDMR.jl
Hurdle Distributed Multinomial Regression (HDMR) implemented in Julia
Stars: ✭ 19 (-36.67%)
Mutual labels:  text-analysis
Penguino-STM32WL-LoRa-E5
This repo contains all the necessary design and fabrication files for the Seeed Studio LoRa-E5 based Penguino Feather breakout board.
Stars: ✭ 30 (+0%)
Mutual labels:  soc
DichotomyTests
Dichotomy Tests provides interesting tests that show you how inclined you are toward certain philosophical, psychological or political views. Each test tries to discover your preferred beliefs and will attempt to accurately position you along numerous dichotomic axes.
Stars: ✭ 78 (+160%)
Mutual labels:  economics
vagas
💼 É dev? É devops? É bom? Quer mexer com muita tecnologia e desafios? Vem pro match!
Stars: ✭ 21 (-30%)
Mutual labels:  jobs
econ5170
Econ5170@CUHK: Computational Methods in Economics (2020 Spring).
Stars: ✭ 127 (+323.33%)
Mutual labels:  economics
collector
A job board data collector
Stars: ✭ 27 (-10%)
Mutual labels:  jobs
soCareers-Data
Data and data processing scripts of StackOverflow Careers pages
Stars: ✭ 18 (-40%)
Mutual labels:  jobs
learning-stm
Learning structural topic modeling using the stm R package.
Stars: ✭ 103 (+243.33%)
Mutual labels:  text-analysis
jobs
C++ 職缺列表
Stars: ✭ 23 (-23.33%)
Mutual labels:  jobs
get hired training
A set of videos and tips to help our students to be hired as developers 💯
Stars: ✭ 18 (-40%)
Mutual labels:  jobs
Econ-Data-Science
Articles/ Journals and Videos related to Economics📈 and Data Science 📊
Stars: ✭ 102 (+240%)
Mutual labels:  economics
moreThanFAANGM
This repository contains opportunities for you to apply to more than 300 product base companies(NOT JUST FAANGM) & good start-ups.
Stars: ✭ 2,609 (+8596.67%)
Mutual labels:  jobs
VLCTechHub-site
VLCTechHub site
Stars: ✭ 23 (-23.33%)
Mutual labels:  jobs
SolveDSGE.jl
A Julia package to solve DSGE models
Stars: ✭ 55 (+83.33%)
Mutual labels:  economics
client
Job listings from all the Formula 1 teams on the grid
Stars: ✭ 27 (-10%)
Mutual labels:  jobs
aylien textapi go
AYLIEN's officially supported Go client library for accessing Text API
Stars: ✭ 15 (-50%)
Mutual labels:  text-analysis
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-16.67%)
Mutual labels:  jobs
rita
Website, documentation and examples for RiTa
Stars: ✭ 42 (+40%)
Mutual labels:  text-analysis
fb scraper
FBLYZE is a Facebook scraping system and analysis system.
Stars: ✭ 61 (+103.33%)
Mutual labels:  tf-idf

occupationcoder

A tool to use job text, such as job description, to assign standard occupational classification codes

Given a job title, job description, and job sector the algorithm assigns a 3-digit standard occupational classification (SOC) code to the job. The algorithm uses the SOC 2010 standard, more details of which can be found on the ONS' website.

This code originally written by Jyldyz Djumalieva, Arthur Turrell, David Copple, James Thurgood, and Bradley Speigner. If you use this code please cite:

Turrell, A., Speigner, B., Djumalieva, J., Copple, D., and Thurgood, J. (2018). Using job vacancies to understand the effects of labour market mismatch on UK output and productivity, Staff Working Paper 737, Bank of England.

Pre-requisites

See requirements.txt for a full list.

occupationcoder is built on top of NLTK and uses 'Wordnet' (a corpora, number 82 on their list) and the Punkt Tokenizer Models (number 106 on their list). When the coder is run, it will expect to find these in their usual directories. If you have nltk installed, you can get them corpora using nltk.download() which will install them in the right directories or you can go to http://www.nltk.org/nltk_data/ to download them manually (and follow the install instructions).

A couple of the other packages, such as fuzzywuzzy, do not come with the Anaconda distribution of Python. You can install these via pip (if you have access to the internet) or download the relevant binaries and install them manually.

File and folder description

  • conda.recipe contains code which helps to install the package
  • occupationcoder/coder applies SOC codes to job descriptions
  • occupationcoder/createdictionaries turns the ONS' index of SOC code into dictionaries used by occupationcoder/coder
  • occupationcoder/dictionaries contains the dictionaries used by occupationcoder/coder
  • occupationcoder/outputs is the default output directory
  • occupationcoder/testvacancies contains 'test' vacancies to run the code on
  • occupationcoder/utilities contains helper functions which mostly manipulate strings

Installation via terminal using pip

Download the package and cd to the download directory. Then use

python setup.py sdist
cd dist
pip install occupationcoder-version.tar.gz

The first line creates the .tar.gz file, the second navigates to the directory with the packaged code in, and the third line installs the package. The version number to use will be evident from the name of the .tar.gz file.

Running the code as a python package

Importing, and creating an instance, of the coder

import pandas as pd
from occupationcoder.coder import coder
myCoder = coder.Coder()

To run the code on a single job, use the following syntax with the codejobrow(job_title,job_description,job_sector) method:

if __name__ == '__main__':
    myCoder.codejobrow('Physicist','Calculations of the universe','Professional scientific')

The if statement is required because the code is parallelised. Note that you can leave some of the fields blank and the algorithm will still return a SOC code.

To run the code on a file (eg csv name 'job_file.csv') with structure

job_title job_description job_sector
Physicist Make calculations about the universe, do research, perform experiments and understand the physical environment. Professional, scientific & technical activities

use

df = pd.read_csv('path/to/foo.csv')
df = myCoder.codedataframe(df)

This will return a new dataframe with SOC code entries appended in a new column:

job_title job_description job_sector SOC_code
Physicist Make calculations about the universe, do research, perform experiments and understand the physical environment. Professional, scientific & technical activities 211

Running the code from the command line

If you have all the relevant packages in requirements.txt, download the code and navigate to the occupationcoder folder (which contains the README). Then run

python -m occupationcoder.coder.coder path/to/foo.csv

This will create a 'processed_jobs.csv' file in the outputs/ folder which has the original text and an extra 'SOC_code' column with the assigned SOC codes.

Testing

The test matches to SOC are run on a file of example jobs, in this case job vacancies. The code to run the test is

python -m occupationcoder.coder.coder occupationcoder/testvacancies/test_vacancies.csv

and the output is in the 'processed_jobs.csv' file in the outputs/ folder.

Acknowledgements

We are very grateful to Emmet Cassidy for testing this algorithm.

Disclaimer

This code is provided 'as is'. We would love it if you made it better or extended it to work for other countries. All views expressed are our personal views, not those of any employer.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].