All Projects → bioinf-mcb → Gisaid Scrapper

bioinf-mcb / Gisaid Scrapper

Licence: mit
Scrapping tool for GISAID data regarding SARS-CoV-2

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gisaid Scrapper

Tianyancha
pip安装的天眼查爬虫API,指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.
Stars: ✭ 206 (+724%)
Mutual labels:  scraper, selenium
TikTok
Download public videos on TikTok using Python with Selenium
Stars: ✭ 37 (+48%)
Mutual labels:  scraper, selenium
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+856%)
Mutual labels:  scraper, selenium
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (+368%)
Mutual labels:  scraper, selenium
kick-off-web-scraping-python-selenium-beautifulsoup
A tutorial-based introduction to web scraping with Python.
Stars: ✭ 18 (-28%)
Mutual labels:  scraper, selenium
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (+448%)
Mutual labels:  scraper, selenium
TinderBotz
Automated Tinder bot and scraper using selenium in python.
Stars: ✭ 265 (+960%)
Mutual labels:  scraper, selenium
Scrapstagram
An Instagram Scrapper
Stars: ✭ 50 (+100%)
Mutual labels:  scraper, selenium
Instagram-Comments-Scraper
Instagram comment scraper using python and selenium. Save the comments into excel.
Stars: ✭ 73 (+192%)
Mutual labels:  scraper, selenium
yt-videos-list
Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.
Stars: ✭ 64 (+156%)
Mutual labels:  scraper, selenium
Sillynium
Automate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (+300%)
Mutual labels:  scraper, selenium
pluralsight scrapper
A course downloader/scrapper for https://www.pluralsight.com
Stars: ✭ 39 (+56%)
Mutual labels:  scraper, selenium
Instaloctrack
An Instagram OSINT tool to collect all the geotagged locations available on an Instagram profile in order to plot them on a map, and dump them in a JSON.
Stars: ✭ 85 (+240%)
Mutual labels:  scraper, selenium
Zillow
Zillow Scraper for Python using Selenium
Stars: ✭ 141 (+464%)
Mutual labels:  scraper, selenium
Spam Bot 3000
Social media research and promotion, semi-autonomous CLI bot
Stars: ✭ 79 (+216%)
Mutual labels:  scraper, selenium
pinterest-web-scraper
Scraping Visually Similar Images from Pinterest
Stars: ✭ 26 (+4%)
Mutual labels:  scraper, selenium
Botvid 19
Messenger Bot that scrapes for COVID-19 data and periodically updates subscribers via Facebook Messages. Created using Python/Flask, MYSQL, HTML, Heroku
Stars: ✭ 34 (+36%)
Mutual labels:  scraper, selenium
InstagramLocationScraper
No description or website provided.
Stars: ✭ 13 (-48%)
Mutual labels:  scraper, selenium
bots-zoo
No description or website provided.
Stars: ✭ 59 (+136%)
Mutual labels:  scraper, selenium
Instagram-Scraper-2021
Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).
Stars: ✭ 57 (+128%)
Mutual labels:  scraper, selenium

GISAID scrapper

Scrapping tool for GISAID data regarding SARS-CoV-2. You need an active account in order to use it.

Preparations

Install all requirements for the scrapper.

pip install -r requirements.txt

You need to download a Firefox WebDriver for your operating system and place it in script's directory.

Your login and password can be provided in credentials.txt file in format:

login
password

Usage

usage: scrap.py [-h] [--username USERNAME] [--password PASSWORD]  
                          [--filename FILENAME] [--destination DESTINATION] 
                          [--headless [HEADLESS]] [--whole [WHOLE]]

optional arguments:
  -h, --help            show this help message and exit
  --username USERNAME, -u USERNAME
                        Username for GISAID
  --password PASSWORD, -p PASSWORD
                        Password for GISAID
  --filename FILENAME, -f FILENAME
                        Path to file with credentials (alternative, default:
                        credentials.txt)
  --destination DESTINATION, -d DESTINATION
                        Destination directory (default: fastas/)
  --headless [HEADLESS], -q [HEADLESS]
                        Headless mode (no browser window)
  --whole [WHOLE], -w [WHOLE]
                        Scrap whole genomes only

Example:

python3 scrap.py -u user -p pass -w

run the scrapper with username user and password pass, downloading only whole sequence data.

python3 scrap.py -w -q -d whole_genome

run the scrapper in headless mode with username and password read from credentials.txt, downloading only whole sequence data into whole_genome directory.

Result

The whole and partial genom sequences from GISAID will be downloaded into fastas/ directory. metadata.tsv file will also be created, containing following information for every sample:

  • Accession
  • Collection date
  • Location
  • Host
  • Additional location information
  • Gender
  • Patient age
  • Patient status
  • Specimen source
  • Additional host information
  • Outbreak
  • Last vaccinated
  • Treatment
  • Sequencing technology
  • Assembly method
  • Coverage
  • Comment
  • Length

as long as they were provided. You can interrupt the download and resume it later, the samples won't be downloaded twice. The tool has only been tested on windows 10.

Docker Image

It is also possible to run this scrapper in headless mode inside docker container. This allows to use it on any Operating System that is able to run Docker. Image created by Pawel Kulig and hosted on his DockerHub.

In this version, all parameters are provided via .env file -- login, password, destination, and whole genome flag.

Aside from gisaid_scrapper container Selenium contianer is used to operate in client server paradigm.

To run scrapper in container run:

docker-compose up

To run it detached add "-d" option.

To build Docker Image on your own run below command inside gisaid_scrapper directory:

docker build --tag name:tag .

geckodriver file inside gisaid_scrapper directory is required to perform this operation. See: https://github.com/mozilla/geckodriver/releases

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].