All Projects → ghostwords → chameleon-crawler

ghostwords / chameleon-crawler

Licence: MPL-2.0 license
Browser automation for Chameleon.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
Nginx
273 projects
CSS
56736 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to chameleon-crawler

Node Chromedriver
An installer and wrapper for Chromedriver.
Stars: ✭ 378 (+2123.53%)
Mutual labels:  selenium, chromedriver
Instagram Profilecrawl
💻 Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!
Stars: ✭ 110 (+547.06%)
Mutual labels:  selenium, chromedriver
Docker Python Chromedriver
Dockerfile for running Python Selenium in headless Chrome (Python 2.7 / 3.6 / 3.7 / 3.8 / Alpine based Python / Chromedriver / Selenium / Xvfb included in different versions)
Stars: ✭ 385 (+2164.71%)
Mutual labels:  selenium, chromedriver
headless-chrome
Implementation of the new headless chrome with chromedriver and selenium.
Stars: ✭ 34 (+100%)
Mutual labels:  selenium, chromedriver
Whatsapp Assistant Bot
A personal WhatsApp assistant bot that will help you search anything on the web (Google, Images, Google Maps)
Stars: ✭ 198 (+1064.71%)
Mutual labels:  selenium, chromedriver
whatabomb
A whatsapp bombing GUI Script
Stars: ✭ 84 (+394.12%)
Mutual labels:  selenium, chromedriver
Sillynium
Automate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (+488.24%)
Mutual labels:  selenium, chromedriver
Insta-Bot
Python bot using Selenium increasing Instagram Followers.
Stars: ✭ 62 (+264.71%)
Mutual labels:  selenium, chromedriver
Zillow
Zillow Scraper for Python using Selenium
Stars: ✭ 141 (+729.41%)
Mutual labels:  selenium, chromedriver
Nightwatch
End-to-end testing framework written in Node.js and using the Webdriver API
Stars: ✭ 10,912 (+64088.24%)
Mutual labels:  selenium, chromedriver
yt-videos-list
Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.
Stars: ✭ 64 (+276.47%)
Mutual labels:  selenium, chromedriver
jest-selenium
This project shows how to drive your selenium tests with Jest.
Stars: ✭ 22 (+29.41%)
Mutual labels:  selenium, chromedriver
dusker
Stand alone Laravel Dusk test suit, which do not require Laravel framework itself
Stars: ✭ 28 (+64.71%)
Mutual labels:  selenium, chromedriver
Undetected Chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Stars: ✭ 365 (+2047.06%)
Mutual labels:  selenium, chromedriver
scribd-dl
Command-line program to download Scribd documents in pdf format
Stars: ✭ 23 (+35.29%)
Mutual labels:  selenium, chromedriver
Autocrawler
Google, Naver multiprocess image web crawler (Selenium)
Stars: ✭ 957 (+5529.41%)
Mutual labels:  selenium, chromedriver
devtools-proxy
Multiplexing proxy for Chrome DevTools. Fully compatible with Selenium and ChromeDriver
Stars: ✭ 64 (+276.47%)
Mutual labels:  selenium, chromedriver
pyderman
Install Selenium-compatible Chrome/Firefox/Opera/PhantomJS/Edge webdrivers automatically.
Stars: ✭ 24 (+41.18%)
Mutual labels:  selenium, chromedriver
Webdrivermanager
WebDriverManager (Copyright © 2015-2021) is a project created and maintained by Boni Garcia and licensed under the terms of the Apache 2.0 License.
Stars: ✭ 1,808 (+10535.29%)
Mutual labels:  selenium, chromedriver
Panther
A browser testing and web crawling library for PHP and Symfony
Stars: ✭ 2,480 (+14488.24%)
Mutual labels:  selenium, chromedriver

Chameleon Crawler

Browser automation for Chameleon.

Setup

  • Install Chromium, chromedriver, python3 and xvfb. On Ubuntu:
sudo apt-get install chromium-browser chromium-chromedriver python3 xvfb
  • Install the project's Python dependencies (documented in requirements.txt). You might do this with virtualenv and pip, or maybe Docker. Note this is a Python 3 project.

  • Make sure chromedriver is in your $PATH. It's not on Ubuntu, so we have to fix that:

sudo ln -s /usr/lib/chromium-browser/chromedriver /usr/local/bin/chromedriver
echo "/usr/lib/chromium-browser/libs" | sudo tee --append /etc/ld.so.conf.d/chrome_lib.conf >/dev/null
sudo ldconfig

Usage

Run ./crawl.py /path/to/chameleon.crx to perform a crawl, or ./crawl.py -h to see the optional arguments:

usage: crawl.py [-h] [--headless | --no-headless] [-n {1,2,3,4,5,6,7,8}] [-q]
                [-t SECONDS] [--urls URL_FILE_PATH]
                CHAMELEON_CRX_FILE_PATH

positional arguments:
  CHAMELEON_CRX_FILE_PATH
                        path to Chameleon CRX package

optional arguments:
  -h, --help            show this help message and exit
  --headless            use a virtual display (default)
  --no-headless
  -n {1,2,3,4,5,6,7,8}  how many browsers to use in parallel (default: 4)
  -q, --quiet           turn off standard output
  -t SECONDS, --timeout SECONDS
                        how many seconds to wait for pages to finish loading
                        before timing out (default: 20)
  --urls URL_FILE_PATH  path to URL list file (default: urls.txt)

Run ./view.py and visit the displayed URL to review crawl results.

Roadmap

  1. Crawl Alexa Global Top 1,000,000 Sites: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
  2. Analyze results:
    • Discover fingerprinters
    • Confirm detection of known fingerprinters
  3. Tweak the heuristic to minimize false negatives/positives.
  4. Create minisite to chart (the growth of?) fingerprinting across the Web.

Code license

Mozilla Public License Version 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].