All Projects → niczem → trawler

niczem / trawler

Licence: other
scraper for facebook, gab, google and tiktok

Programming Languages

javascript
184084 projects - #8 most used programming language
Vue
7211 projects
HTML
75241 projects

Projects that are alternatives of or similar to trawler

Website-downloader
💡 Download the complete source code of any website (including all assets). [ Javascripts, Stylesheets, Images ] using Node.js
Stars: ✭ 615 (+2975%)
Mutual labels:  scraper
barclayscrape
A small app to programmatically mainpulate Barclays online banking
Stars: ✭ 57 (+185%)
Mutual labels:  scraper
scripts
A collection of random scripts I coded up
Stars: ✭ 17 (-15%)
Mutual labels:  scraper
tripadvisor-scraper
Scrape Tripadvisor restaurant, hotels, and places.
Stars: ✭ 40 (+100%)
Mutual labels:  scraper
ColegaDondeEstaMiTFM
Un bot de Twitter que comparte cada hora un TFM hasta que Cristina Cifuentes enseñe el suyo.
Stars: ✭ 14 (-30%)
Mutual labels:  scraper
scrapers
scrapers for building your own image databases
Stars: ✭ 46 (+130%)
Mutual labels:  scraper
tinyPornManager
Made for pornhub. Fork from tinyMediaManager v3
Stars: ✭ 57 (+185%)
Mutual labels:  scraper
PTTmineR
Parallel Searching and Crawling Data from PTT 🚀
Stars: ✭ 31 (+55%)
Mutual labels:  scraper
stweet
Advanced python library to scrap Twitter (tweets, users) from unofficial API
Stars: ✭ 287 (+1335%)
Mutual labels:  scraper
fansly
Simply scrape / download all the media from an fansly account
Stars: ✭ 351 (+1655%)
Mutual labels:  scraper
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (+165%)
Mutual labels:  scraper
scrapy-LBC
Araignée LeBonCoin avec Scrapy et ElasticSearch
Stars: ✭ 14 (-30%)
Mutual labels:  scraper
INMET-API-temperature
Crawler dos dados metereológicos de estações convencionais do INMET (BDMEP)
Stars: ✭ 32 (+60%)
Mutual labels:  scraper
web-crawler
Python Web Crawler with Selenium and PhantomJS
Stars: ✭ 19 (-5%)
Mutual labels:  scraper
wordpress-scraper
Simple, easy-to-use scraper to scrape data from WordPress JSON API
Stars: ✭ 22 (+10%)
Mutual labels:  scraper
Facebook-Profile-Pictures-Downloader
😆 Download public profile pictures from Facebook.
Stars: ✭ 23 (+15%)
Mutual labels:  scraper
cat-message
Finds cat images/videos/gifs on reddit, sends them to my mom via applescript
Stars: ✭ 35 (+75%)
Mutual labels:  scraper
python web scraping
Web scraping using python, requests and selenium
Stars: ✭ 40 (+100%)
Mutual labels:  scraper
BookingScraper
🌎 🏨 Scrape Booking.com 🏨 🌎
Stars: ✭ 68 (+240%)
Mutual labels:  scraper
PyScholar
A 'supervised' parser for Google Scholar
Stars: ✭ 74 (+270%)
Mutual labels:  scraper

Trawler

A job scheduler and analysis tool for webscraping (and other) tasks.

Node.js Package

Datasources

Curently the following datasources are implemented:

"

  • facebook posts and reactions scrape facebook posts, comments and reactions (like, heart, etc)
  • gab (nazi-twitter) crawl posts for user
  • google dorking find interesting files and download them
  • json to csv convert json array into csv
  • mail sends mails and files - mostly usefull in pipelines
  • masscan udp based port scanner (requires docker)
  • motiondetection script to to motionanalysis in directory with videofiles
  • onionlist download tor-catalogue from onionlist.org
  • onions.danwin1210.de download tor-catalogue from danwin1210.de, and creates screenshots of each website in the result
  • tiktok get video metadata per hashtag, download them and analyse the text using easyOCR
  • url generic http scraper
  • urlscreenshotter scrapes comma separated list of urls and creates screenshot of each of them"

Create your own datasource

- copy template dir in ./jobs
- define fields in fields.js which are needed to start the job
- a job can output one or multiple files
- no directories should be used, please use archives
- use job_id.ext (eg job_id.json) as filename

Features

  • simple configuration of actions/datasources, also from 3rd party modules/repos
  • job monitoring and scheduling
  • schedule jobs
  • sqlite, csv and json browser
  • separation of datasets/artifacts (one archive per crawl)
  • scalable amount of workers (also on other machines)

Architecture

Frontend and API

  • GUI to create and schedule jobs
  • Displays pending, running and done jobs
  • Display csv and sqlite datasets

Worker(s)

  • Can be distributed (workers and c&c on different locations/servers)
  • Jobs are managed through json files (and can be distrubuted with an adapter like pouchDB)
  • Multithreaded

Install & run

Using NPM

npm i
npm run all
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].