zcrawlAn open source web crawling platform
Stars: β 21 (-91.53%)
Linkedin Profile Scraperπ΅οΈββοΈ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: β 171 (-31.05%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: β 198 (-20.16%)
Apify JsApify SDK β The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: β 3,154 (+1171.77%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: β 123 (-50.4%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: β 15,535 (+6164.11%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: β 440 (+77.42%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: β 17 (-93.15%)
bots-zooNo description or website provided.
Stars: β 59 (-76.21%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: β 53 (-78.63%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: β 98 (-60.48%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: β 42,343 (+16973.79%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: β 309 (+24.6%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: β 100 (-59.68%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: β 68 (-72.58%)
SasilaδΈδΈͺη΅ζ΄»γεε₯½ηη¬θ«ζ‘ζΆ
Stars: β 286 (+15.32%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: β 17 (-93.15%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: β 1,728 (+596.77%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: β 51 (-79.44%)
pompScreen scraping and web crawling framework
Stars: β 61 (-75.4%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: β 38 (-84.68%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: β 456 (+83.87%)
FerretDeclarative web scraping
Stars: β 4,837 (+1850.4%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: β 22 (-91.13%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: β 52 (-79.03%)
socialsπ¨βπ©βπ¦ Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: β 37 (-85.08%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: β 277 (+11.69%)
Lulu[Unmaintained] A simple and clean video/music/image downloader πΎ
Stars: β 789 (+218.15%)
CrawlerGo process used to crawl websites
Stars: β 147 (-40.73%)
IdtImage Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.
Stars: β 202 (-18.55%)
Fantasy Basketball Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Stars: β 146 (-41.13%)
Search Engine ParserLightweight package to query popular search engines and scrape for result titles, links and descriptions
Stars: β 216 (-12.9%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: β 144 (-41.94%)
EmbedGet info from any web service or page
Stars: β 1,808 (+629.03%)
MassivedlDownload a large list of files concurrently
Stars: β 141 (-43.15%)
Jsonframe Cheeriosimple multi-level scraper json input/output for Cheerio
Stars: β 196 (-20.97%)
Instagram BotAn Instagram bot developed using the Selenium Framework
Stars: β 138 (-44.35%)
Educative.io Downloaderπ This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: β 139 (-43.95%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: β 239 (-3.63%)
Goose ParserUniversal scrapping tool, which allows you to extract data using multiple environments
Stars: β 211 (-14.92%)
JuriscraperAn API to scrape American court websites for metadata.
Stars: β 194 (-21.77%)
UdemycoursegrabberYour will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: β 137 (-44.76%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: β 2,277 (+818.15%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: β 11,545 (+4555.24%)
Torchbearπ₯π» The Speakeasy Scripting Engine Which Combines Speed, Safety, and Simplicity
Stars: β 128 (-48.39%)
Anime DlAnime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Stars: β 190 (-23.39%)
Bhban rpa6κ°μ μΉ μ
무λ₯Ό ν루 λ§μ λλ΄λ μ
무 μλν(μλ₯μΆνμ¬, 2020)μ μμ μ½λμ
λλ€. νμ΄μ¬μ ν λ²λ λ°°μλ³Έ μ μλ λΆλ€μ μν μμ μ΄λ©°, μμ
λΆν° λμμΈ, 맀ν¬λ‘, ν¬λ‘€λ§κΉμ§ μ
무 μλνμ κ΄λ ¨λ λ€μν λΆμΌ μμ κ° μ 곡λ©λλ€.
Stars: β 124 (-50%)
N2h4λ€μ΄λ² λ΄μ€ μμ§μ μν λꡬ
Stars: β 177 (-28.63%)
CorpuscrawlerCrawler for linguistic corpora
Stars: β 127 (-48.79%)
Cdp4jcdp4j - Chrome DevTools Protocol for Java
Stars: β 232 (-6.45%)
ThalGetting started with Puppeteer and Chrome Headless for Web Scraping
Stars: β 2,345 (+845.56%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: β 125 (-49.6%)
HtmlsqlhtmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.
Stars: β 120 (-51.61%)