OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-95.31%)
PulsarTurn large Web sites into tables and charts using simple SQLs.
Stars: ✭ 100 (-68.75%)
SpidrA versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+105%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-13.44%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-85%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-87.5%)
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (-90%)
IMDB-ScraperScrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-88.44%)
comic-scraper[Python] Scraps comics and manga from various websites and creates cbz files from them
Stars: ✭ 16 (-95%)
linkextractorA Docker tutorial using a link extraction application example
Stars: ✭ 41 (-87.19%)
Node-js-functionalitiesThis repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below
Stars: ✭ 69 (-78.44%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-93.44%)
ComicBookMakerScript to fetch webcomics and use them to create ebooks.
Stars: ✭ 27 (-91.56%)
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (-75.31%)
PaperScraperA web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journals.
Stars: ✭ 63 (-80.31%)
siteshooter📷 Automate full website screenshots and PDF generation with multiple viewport support.
Stars: ✭ 63 (-80.31%)
Php Curl ClassPHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Stars: ✭ 2,903 (+807.19%)
SchweizerMesser🎯Python 3 网络爬虫实战、数据分析合集 | 当当 | 网易云音乐 | unsplash | 必胜客 | 猫眼 |
Stars: ✭ 89 (-72.19%)
WebCrawlerJust a simple web crawler which return crawled links as IObservable using reactive extension and async await.
Stars: ✭ 55 (-82.81%)
rreddit𝐫⟋ Get Reddit data
Stars: ✭ 49 (-84.69%)
extractnetA Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (-83.75%)
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-85%)
evineInteractive CLI Web Crawler
Stars: ✭ 140 (-56.25%)
learncpp-downloadScrape bot, to get you an offline copy of tutorials
Stars: ✭ 23 (-92.81%)
actor-scraperHouse of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
Stars: ✭ 83 (-74.06%)
heroshiHeroshi – open source web crawler.
Stars: ✭ 51 (-84.06%)
UnChainA tool to find redirection chains in multiple URLs
Stars: ✭ 77 (-75.94%)
tableau-scrapingTableau scraper python library. R and Python scripts to scrape data from Tableau viz
Stars: ✭ 91 (-71.56%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+885.63%)
Mimo-CrawlerA web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
Stars: ✭ 22 (-93.12%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-95.31%)
htmlunit🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
Stars: ✭ 39 (-87.81%)
automation-scriptsSimple scripts that I'm using to automate the boring things.
Stars: ✭ 14 (-95.62%)
leetcode-compensationCompensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.
Stars: ✭ 83 (-74.06%)
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
Stars: ✭ 24 (-92.5%)
WaWebSessionHandler(DISCONTINUED) Save WhatsApp Web Sessions as files and open them everywhere!
Stars: ✭ 27 (-91.56%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (-77.81%)
halfstaff🇺🇸 Is the US flag at half-staff?
Stars: ✭ 22 (-93.12%)
iwwAI based web-wrapper for web-content-extraction
Stars: ✭ 61 (-80.94%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (-19.69%)
Linkedin-ClientWeb scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (-86.87%)
pyCreeper一个用来快速提取网页内容的信息采集(爬虫)框架, 实现了对网页的动态加载与控制。
Stars: ✭ 25 (-92.19%)
raspagem-de-dados-fatec📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
Stars: ✭ 22 (-93.12%)
grailerweb scraping tool for grailed.com
Stars: ✭ 30 (-90.62%)
codechef-rank-comparatorWeb application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
Stars: ✭ 23 (-92.81%)
bolsaBiblioteca feita em Python com o objetivo de facilitar o acesso a dados de seus investimentos na bolsa de valores(B3/CEI) através do Portal CEI.
Stars: ✭ 46 (-85.62%)
SupercrawlerA web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Stars: ✭ 306 (-4.37%)
investigation-amazon-brandsMaterials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"
Stars: ✭ 56 (-82.5%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (-71.87%)
LagoujobJob data mining repo for lagou.com
Stars: ✭ 256 (-20%)