City ScrapersScrape, standardize and share public meetings from local government websites
Stars: ✭ 220 (+139.13%)
Scrapyd Cluster On HerokuSet up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Stars: ✭ 106 (+15.22%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-81.52%)
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (-14.13%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+404.35%)
IMDB-ScraperScrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-59.78%)
Juno crawlerScrapy crawler to collect data on the back catalog of songs listed for sale.
Stars: ✭ 150 (+63.04%)
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-83.7%)
Netflix CloneNetflix like full-stack application with SPA client and backend implemented in service oriented architecture
Stars: ✭ 156 (+69.57%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-77.17%)
Scrapy CraigslistWeb Scraping Craigslist's Engineering Jobs in NY with Scrapy
Stars: ✭ 54 (-41.3%)
wayback⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (-43.48%)
Quora ApiAn unofficial API for Quora.
Stars: ✭ 250 (+171.74%)
scrapy helperDynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (-8.7%)
Wayback Machine ScraperA command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 230 (+150%)
DocbaoCông cụ quét và phân tích từ khoá các trang báo mạng Việt Nam
Stars: ✭ 230 (+150%)
vietnam-ecommerce-crawlerCrawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs
Stars: ✭ 28 (-69.57%)
lopezCrawling and scraping the Web for fun and profit
Stars: ✭ 20 (-78.26%)
Short Jokes DatasetPython scripts for building 'Short Jokes' dataset, featured on Kaggle
Stars: ✭ 215 (+133.7%)
Trump LiesTutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (+118.48%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (-38.04%)
Twitter IntelligenceTwitter Intelligence OSINT project performs tracking and analysis of the Twitter
Stars: ✭ 179 (+94.57%)
UofT-Timetable-GeneratorA web application that generates timetables for university students at the University of Toronto
Stars: ✭ 34 (-63.04%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (-23.91%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+159.78%)
scrapy-LBCAraignée LeBonCoin avec Scrapy et ElasticSearch
Stars: ✭ 14 (-84.78%)
2017-summer-workshopExercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)
Stars: ✭ 33 (-64.13%)
Selenium Python HeliumSelenium-python but lighter: Helium is the best Python library for web automation.
Stars: ✭ 2,732 (+2869.57%)
cinedantan🎥 🍿 Streaming Public domain movies
Stars: ✭ 52 (-43.48%)
Bet On SibylMachine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Stars: ✭ 190 (+106.52%)
crawlerpython爬虫项目集合
Stars: ✭ 29 (-68.48%)
GrabWeb Scraping Framework
Stars: ✭ 2,147 (+2233.7%)
HiA Programming language for Web Scraping
Stars: ✭ 14 (-84.78%)
ArticleSpiderCrawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
Stars: ✭ 34 (-63.04%)
Neural-Scam-ArtistWeb Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-80.43%)
codepen-puppeteerUse Puppeteer to download pens from Codepen.io as single html pages
Stars: ✭ 22 (-76.09%)
vandalNavigator for Web Archive
Stars: ✭ 146 (+58.7%)
LearnpythonforresearchThis repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (+77.17%)
Web ScrapingDetailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Stars: ✭ 153 (+66.3%)
asyncpy使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
Stars: ✭ 86 (-6.52%)
archeAnalyze scraped data
Stars: ✭ 49 (-46.74%)
HelenaA Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.
Stars: ✭ 151 (+64.13%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+60.87%)
lgcrawlpython+scrapy+splash 爬取拉勾全站职位信息
Stars: ✭ 22 (-76.09%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+56.52%)
ZillowZillow Scraper for Python using Selenium
Stars: ✭ 141 (+53.26%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+33.7%)
web-poetWeb scraping Page Objects core library
Stars: ✭ 67 (-27.17%)
pagserPagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Stars: ✭ 82 (-10.87%)
Html MetadataMetaData html scraper and parser for Node.js (supports Promises and callback style)
Stars: ✭ 129 (+40.22%)
Actor Page AnalyzerApify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
Stars: ✭ 124 (+34.78%)