All Projects → PaulMcInnis → Jobfunnel

PaulMcInnis / Jobfunnel

Licence: mit
Scrape job websites into a single spreadsheet with no duplicates.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Jobfunnel

kick-off-web-scraping-python-selenium-beautifulsoup
A tutorial-based introduction to web scraping with Python.
Stars: ✭ 18 (-98.82%)
Mutual labels:  scraper, csv, beautifulsoup
Bull
Bull module for Nest framework (node.js) 🐮
Stars: ✭ 356 (-76.7%)
Mutual labels:  job, jobs
dolarPy
Checks USD/PYG exchange rate from several sites, with a calculator, RESTful API and a twitter bot
Stars: ✭ 45 (-97.05%)
Mutual labels:  scraper, beautifulsoup4
Finviz
Unofficial API for finviz.com
Stars: ✭ 493 (-67.74%)
Mutual labels:  csv, scraper
MediumScraper
Scraping articles of medium and providing audio versions 📑 to 🔊 using django
Stars: ✭ 12 (-99.21%)
Mutual labels:  beautifulsoup, beautifulsoup4
quoters
📝 Random quotes generator package. Available on npm and PyPi
Stars: ✭ 17 (-98.89%)
Mutual labels:  scraper, beautifulsoup4
Datamodel Code Generator
Pydantic model generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
Stars: ✭ 393 (-74.28%)
Mutual labels:  csv, yaml
Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-98.36%)
Mutual labels:  scraper, beautifulsoup
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+304.45%)
Mutual labels:  csv, yaml
Duckduckgo
An unofficial DuckDuckGo search API.
Stars: ✭ 6 (-99.61%)
Mutual labels:  search, scraper
Bree
🚥 The best job scheduler for Node.js and JavaScript with cron, dates, ms, later, and human-friendly support. Works in Node v10+ and browsers, uses workers to spawn sandboxed processes, and supports async/await, retries, throttling, concurrency, and graceful shutdown. Simple, fast, and lightweight. Made for @ForwardEmail and @ladjs.
Stars: ✭ 933 (-38.94%)
Mutual labels:  job, jobs
inDoors
Chrome/Firefox extension that displays companies' Glassdoor ratings on LinkedIn and other job sites
Stars: ✭ 33 (-97.84%)
Mutual labels:  glassdoor, indeed
fb-page-chat-download
Python script to download messages from a Facebook page to a CSV file
Stars: ✭ 51 (-96.66%)
Mutual labels:  scraper, csv
TorScrapper
A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (-98.43%)
Mutual labels:  scraper, beautifulsoup
Countries States Cities Database
🌍 World countries, states, regions, provinces, cities, towns in JSON, SQL, XML, PLIST, YAML, and CSV. All Countries, States, Cities with ISO2, ISO3, Country Code, Phone Code, Capital, Native Language, Timezones, Latitude, Longitude, Region, Subregion, Flag Emoji, and Currency. #countries #states #cities
Stars: ✭ 1,130 (-26.05%)
Mutual labels:  csv, yaml
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-75.65%)
Mutual labels:  csv, yaml
csvy
Import and Export CSV Data With a YAML Metadata Header
Stars: ✭ 52 (-96.6%)
Mutual labels:  yaml, csv
odin
Data-structure definition/validation/traversal, mapping and serialisation toolkit for Python
Stars: ✭ 24 (-98.43%)
Mutual labels:  yaml, csv
Countries
World countries in JSON, CSV, XML and Yaml. Any help is welcome!
Stars: ✭ 5,379 (+252.03%)
Mutual labels:  csv, yaml
Re Txt
converts text-formats from one to another, it is very useful if you want to re-format a json file to yaml, toml to yaml, csv to yaml, ... etc
Stars: ✭ 59 (-96.14%)
Mutual labels:  csv, yaml

JobFunnel Banner
Build Status Code Coverage

Automated tool for scraping job postings into a .csv file.

Since this project was developed, CAPTCHA has clamped down hard, help us re-build the backend and make this tool useful again!

Benefits over job search sites:

  • Never see the same job twice!
  • No advertising.
  • See jobs from multiple job search websites all in one place.

masterlist.csv

Installation

JobFunnel requires Python 3.8 or later.

pip install git+https://github.com/PaulMcInnis/JobFunnel.git

Usage

By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.

Configure

You can search for jobs with YAML configuration files or by passing command arguments.

Download the demo settings.yaml by running the below command:

wget https://git.io/JUWeP -O my_settings.yaml

NOTE:

  • It is recommended to provide as few search keywords as possible (i.e. Python, AI).

  • JobFunnel currently supports CANADA_ENGLISH, USA_ENGLISH, UK_ENGLISH, FRANCE_FRENCH, and GERMANY_GERMAN locales.

Scrape

Run funnel with your settings YAML to populate your master CSV file with jobs from available providers:

funnel load -s my_settings.yaml

Review

Open the master CSV file and update the per-job status:

  • Set to interested, applied, interview or offer to reflect your progression on the job.

  • Set to archive, rejected or delete to remove a job from this search. You can review 'blocked' jobs within your block_list_file.

Advanced Usage

  • Automating Searches
    JobFunnel can be easily automated to run nightly with crontab
    For more information see the crontab document.

  • Writing your own Scrapers
    If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details.

  • Remote Work
    Bypass a frustrating user experience looking for remote work by setting the search parameter remoteness to match your desired level, i.e. FULLY_REMOTE.

  • Adding Support for X Language / Job Website
    JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.

  • Blocking Companies
    Filter undesired companies by adding them to your company_block_list in your YAML or pass them by command line as -cbl.

  • Job Age Filter
    You can configure the maximum age of scraped listings (in days) by configuring max_listing_days.

  • Reviewing Jobs in Terminal
    You can review the job list in the command line:

    column -s, -t < master_list.csv | less -#2 -N -S
    
  • Respectful Delaying
    Respectfully scrape your job posts with our built-in delaying algorithms.

    To better understand how to configure delaying, check out this Jupyter Notebook which breaks down the algorithm step by step with code and visualizations.

  • Recovering Lost Data
    JobFunnel can re-build your master CSV from your cache_folder where all the historic scrape data is located:

    funnel --recover
    
  • Running by CLI
    You can run JobFunnel using CLI only, review the command structure via:

    funnel inline -h
    

CAPTCHA

JobFunnel does not solve CAPTCHA. If, while scraping, you receive a Unable to extract jobs from initial search result page:\ error. Then open that url on your browser and solve the CAPTCHA manually.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].