A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+1071.43%)

Mutual labels: scraper, web-scraper

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (+276.79%)

Mutual labels: scraper, parsing

View All Similar Projects ➔

Yellow Pages Business Details Scraper

Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.

If you would like to know more about this scraper you can check it out at the blog post 'How to Scrape Business Details from Yellow Pages using Python and LXML' - https://www.scrapehero.com/how-to-scrape-business-details-from-yellowpages-com-using-python-and-lxml/

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Fields to Extract

This yellow pages scraper can extract the fields below:

Rank
Business Name
Phone Number
Business Page
Category
Website
Rating
Street name
Locality
Region
Zipcode
URL

Prerequisites

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

lxml
requests

Installation

PIP to install the following packages in Python (https://pip.pypa.io/en/stable/installing/)

Python Requests, to make requests and download the HTML content of the pages (http://docs.python-requests.org/en/master/user/install/)

Python LXML, for parsing the HTML Tree Structure using Xpaths (Learn how to install that here – http://lxml.de/installation.html)

Running the scraper

We would execute the code with the script name followed by the positional arguments keyword and place. Here is an example to find the business details for restaurants in Boston. MA.

python3 yellow_pages.py restaurants Boston,MA

Sample Output

This will create a csv file:

Sample Output

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

scrapehero / yellowpages-scraper

Programming Languages

Labels

Projects that are alternatives of or similar to yellowpages-scraper

Yellow Pages Business Details Scraper

Getting Started

Fields to Extract

Prerequisites

Installation

Running the scraper

Sample Output