All Projects → gaspa93 → Googlemaps Scraper

gaspa93 / Googlemaps Scraper

Licence: gpl-3.0
Google Maps reviews scraping

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Googlemaps Scraper

Pitchfork
🎶 Unofficial python API for pitchfork.com reviews.
Stars: ✭ 67 (-22.99%)
Mutual labels:  scraper
Instascrape
🚀 A fast and lightweight utility and Python library for downloading posts, stories, and highlights from Instagram.
Stars: ✭ 76 (-12.64%)
Mutual labels:  scraper
Hooman
http interceptor to hoomanize cloudflare requests
Stars: ✭ 82 (-5.75%)
Mutual labels:  scraper
Goscrape
Web scraper that can create an offline readable version of a website
Stars: ✭ 69 (-20.69%)
Mutual labels:  scraper
Pymarketcap
Python3 API wrapper and web scraper for https://coinmarketcap.com
Stars: ✭ 73 (-16.09%)
Mutual labels:  scraper
Kikoeru Express
kikoeru 后端,不再维护,请到https://github.com/umonaca/kikoeru-express 获取更新
Stars: ✭ 79 (-9.2%)
Mutual labels:  scraper
Making Maps With React
🌐 Example React components for React-Leaflet, Pigeon Maps, React MapGL and more
Stars: ✭ 66 (-24.14%)
Mutual labels:  google-maps
React Places Autocomplete
React component for Google Maps Places Autocomplete
Stars: ✭ 1,265 (+1354.02%)
Mutual labels:  google-maps
Pittapi
An API to easily get data from the University of Pittsburgh
Stars: ✭ 74 (-14.94%)
Mutual labels:  scraper
Email Extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (-6.9%)
Mutual labels:  scraper
Skraper
Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, IFunny, VK, Pikabu)
Stars: ✭ 72 (-17.24%)
Mutual labels:  scraper
Goscraper
Golang pkg to quickly return a preview of a webpage (title/description/images)
Stars: ✭ 72 (-17.24%)
Mutual labels:  scraper
Spam Bot 3000
Social media research and promotion, semi-autonomous CLI bot
Stars: ✭ 79 (-9.2%)
Mutual labels:  scraper
Wagtailgmaps
Simple Google Maps address formatter for Wagtail fields
Stars: ✭ 68 (-21.84%)
Mutual labels:  google-maps
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+1332.18%)
Mutual labels:  scraper
Pastebin Scraper
Live-scraping pastebin to fight boredom.
Stars: ✭ 66 (-24.14%)
Mutual labels:  scraper
Proxy Scraper
Library for scraping free proxies lists
Stars: ✭ 78 (-10.34%)
Mutual labels:  scraper
Image search
Python Library to download images and metadata from popular search engines.
Stars: ✭ 86 (-1.15%)
Mutual labels:  scraper
Instaloctrack
An Instagram OSINT tool to collect all the geotagged locations available on an Instagram profile in order to plot them on a map, and dump them in a JSON.
Stars: ✭ 85 (-2.3%)
Mutual labels:  scraper
Wombat
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Stars: ✭ 1,220 (+1302.3%)
Mutual labels:  scraper

Google Maps Scraper

Scraper of Google Maps reviews. The code allows to extract the most recent reviews starting from the url of a specific Point Of Interest (POI) in Google Maps. An additional extension helps to monitor and incrementally store the reviews in a MongoDB instance.

Installation

Follow these steps to use the scraper:

  • Download Chromedrive from here.

  • Install Python packages from requirements file, either using pip, conda or virtualenv:

      conda create --name scraping python=3.6 --file requirements.txt
    

Note: Python >= 3.6 is required.

Basic Usage

The scraper.py script needs two main parameters as input:

  • --i: input file name, containing a list of urls that point to Google Maps place reviews (default: urls.txt)
  • --N: number of reviews to retrieve, starting from the most recent (default: 100)

Example:

python scraper.py --N 50

generates a csv file containing last 50 reviews of places present in urls.txt

In current implementation, the CSV file is handled as an external function, so if you want to change path and/or name of output file, you need to modify that function.

Additionally, other parameters can be provided:

  • --place: boolean value that allows to scrape POI metadata instead of reviews (default: false)
  • --debug: boolean value that allows to run the browser using the graphical interface (default: false)
  • --source: boolean value that allows to store source URL as additional field in CSV (default: false)
  • --sort-by: string value among most_relevant, newest, highest_rating or lowest_rating (default: newest), developed by @quaesito and that allows to change sorting behavior of reviews

For a basic description of logic and approach about this software development, have a look at the Medium post

Monitoring functionality

The monitor.py script can be used to have an incremental scraper and override the limitation about the number of reviews that can be retrieved. The only additional requirement is to install MongoDB on your laptop: you can find a detailed guide on the official site

The script takes two input:

  • --i: same as scraper.py script
  • --from-date: string date in the format YYYY-MM-DD, gives the minimum date that the scraper tries to obtain

The main idea is to periodically run the script to obtain latest reviews: the scraper stores them in MongoDB up to get either the latest review of previous run or the day indicated in the input parameter.

Notes

Url must be provided as expected, you can check the example file urls.txt to have an idea of what is a correct url. If you want to generate the correct url:

  1. Go to Google Maps and look for a specific place;
  2. Click on the number of reviews in the parenthesis;
  3. Save the url that is generated from the latest interaction.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].