Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

Stars: ✭ 21 (-77.17%)

Mutual labels: web-scraping, scrapy

scrapy plus

scrapy 常用爬网必备工具包

Stars: ✭ 18 (-80.43%)

Mutual labels: scrapy, scrapy-extension

Scrapy Craigslist

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

Stars: ✭ 54 (-41.3%)

Mutual labels: web-scraping, scrapy

wayback

⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs

Stars: ✭ 52 (-43.48%)

Mutual labels: web-scraping, wayback-machine

Quora Api

An unofficial API for Quora.

Stars: ✭ 250 (+171.74%)

Mutual labels: web-scraping

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

Stars: ✭ 84 (-8.7%)

Mutual labels: scrapy

Wayback Machine Scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 230 (+150%)

Mutual labels: web-scraping

Docbao

Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam

Stars: ✭ 230 (+150%)

Mutual labels: web-scraping

vietnam-ecommerce-crawler

Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

Stars: ✭ 28 (-69.57%)

Mutual labels: scrapy

lopez

Crawling and scraping the Web for fun and profit

Stars: ✭ 20 (-78.26%)

Mutual labels: web-scraping

Short Jokes Dataset

Python scripts for building 'Short Jokes' dataset, featured on Kaggle

Stars: ✭ 215 (+133.7%)

Mutual labels: web-scraping

Trump Lies

Tutorial: Web scraping in Python with Beautiful Soup

Stars: ✭ 201 (+118.48%)

Mutual labels: web-scraping

PythonScrapyBasicSetup

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

Stars: ✭ 57 (-38.04%)

Mutual labels: web-scraping

Twitter Intelligence

Twitter Intelligence OSINT project performs tracking and analysis of the Twitter

Stars: ✭ 179 (+94.57%)

Mutual labels: web-scraping

UofT-Timetable-Generator

A web application that generates timetables for university students at the University of Toronto

Stars: ✭ 34 (-63.04%)

Mutual labels: web-scraping

crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

Stars: ✭ 70 (-23.91%)

Mutual labels: web-scraping

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+159.78%)

Mutual labels: web-scraping

scrapy-LBC

Araignée LeBonCoin avec Scrapy et ElasticSearch

Stars: ✭ 14 (-84.78%)

Mutual labels: scrapy

2017-summer-workshop

Exercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)

Stars: ✭ 33 (-64.13%)

Mutual labels: web-scraping

Web Database Analytics

Web scrapping and related analytics using Python tools

Stars: ✭ 175 (+90.22%)

Mutual labels: web-scraping

Selenium Python Helium

Selenium-python but lighter: Helium is the best Python library for web automation.

Stars: ✭ 2,732 (+2869.57%)

Mutual labels: web-scraping

cinedantan

🎥 🍿 Streaming Public domain movies

Stars: ✭ 52 (-43.48%)

Mutual labels: archive-dot-org

R Web Scraping Cheat Sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

Stars: ✭ 207 (+125%)

Mutual labels: web-scraping

scrapy-rotated-proxy

A scrapy middleware to use rotated proxy ip list.

Stars: ✭ 22 (-76.09%)

Mutual labels: scrapy

Bet On Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

Stars: ✭ 190 (+106.52%)

Mutual labels: web-scraping

crawler

python爬虫项目集合

Stars: ✭ 29 (-68.48%)

Mutual labels: scrapy

Grab

Web Scraping Framework

Stars: ✭ 2,147 (+2233.7%)

Mutual labels: web-scraping

A Programming language for Web Scraping

Stars: ✭ 14 (-84.78%)

Mutual labels: web-scraping

ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

Stars: ✭ 34 (-63.04%)

Mutual labels: scrapy

Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Stars: ✭ 18 (-80.43%)

Mutual labels: web-scraping

codepen-puppeteer

Use Puppeteer to download pens from Codepen.io as single html pages

Stars: ✭ 22 (-76.09%)

Mutual labels: web-scraping

vandal

Navigator for Web Archive

Stars: ✭ 146 (+58.7%)

Mutual labels: wayback-machine

Learnpythonforresearch

This repository provides everything you need to get started with Python for (social science) research.

Stars: ✭ 163 (+77.17%)

Mutual labels: web-scraping

Scrapy-tripadvisor-reviews

Using scrapy to scrape tripadvisor in order to get users' reviews.

Stars: ✭ 24 (-73.91%)

Mutual labels: scrapy

Web Scraping

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Stars: ✭ 153 (+66.3%)

Mutual labels: web-scraping

asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

Stars: ✭ 86 (-6.52%)

Mutual labels: scrapy

arche

Analyze scraped data

Stars: ✭ 49 (-46.74%)

Mutual labels: scrapy

Helena

A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.

Stars: ✭ 151 (+64.13%)

Mutual labels: web-scraping

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+60.87%)

Mutual labels: web-scraping

lgcrawl

python+scrapy+splash 爬取拉勾全站职位信息

Stars: ✭ 22 (-76.09%)

Mutual labels: scrapy

Sqrape

Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)

Stars: ✭ 144 (+56.52%)

Mutual labels: web-scraping

Zillow

Zillow Scraper for Python using Selenium

Stars: ✭ 141 (+53.26%)

Mutual labels: web-scraping

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+33.7%)

Mutual labels: scrapy

web-poet

Web scraping Page Objects core library

Stars: ✭ 67 (-27.17%)

Mutual labels: web-scraping

pagser

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Stars: ✭ 82 (-10.87%)

Mutual labels: scrapy

Html Metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

Stars: ✭ 129 (+40.22%)

Mutual labels: web-scraping

Actor Page Analyzer

Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.

Stars: ✭ 124 (+34.78%)

Mutual labels: web-scraping

concurrent-web-scraping

Building a Concurrent Web Scraper with Python and Selenium

Stars: ✭ 28 (-69.57%)

Mutual labels: web-scraping

1-60 of 355 similar projects

›

next*5