crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-33.33%)
Holiday Cn📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (+375.76%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+5136.36%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+500%)
mal-analysisgithub repo for MyAnimeList analysis. Also links to the MAL dataset.
Stars: ✭ 31 (-6.06%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+34884.85%)
RobotArmHelix3D Simulation, forward and inverse kinematics of a robotic arm in C# using WPF and helix-toolkit
Stars: ✭ 84 (+154.55%)
Dig Etl EngineDownload DIG to run on your laptop or server.
Stars: ✭ 81 (+145.45%)
Pdf downloaderA Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (-45.45%)
Cdp4jcdp4j - Chrome DevTools Protocol for Java
Stars: ✭ 232 (+603.03%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (+12.12%)
N2h4네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (+436.36%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+418.18%)
MassivedlDownload a large list of files concurrently
Stars: ✭ 141 (+327.27%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+272.73%)
CorpuscrawlerCrawler for linguistic corpora
Stars: ✭ 127 (+284.85%)
LarkatorARK dino locator that uses your saved .ark
Stars: ✭ 42 (+27.27%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+203.03%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (+112.12%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+60.61%)
MoalemYarA personal project for class management, using various technologies like WPF, Entityframwork, CodeFirst, Sqlite, Migration and more
Stars: ✭ 53 (+60.61%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+1830.3%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+651.52%)
xXx dead xXxb̶̡̪̬͒l̸̰̗̝̀ỏ̷̡̩g̴͇̑g̶̲̱̽͐i̵̹͗n̶̤̥͂̅̆g̴̮̾̅͜ ̷̧͎͆i̷̛͒͜͠n̸̥̺͒ ̶͚͚͊̿͜t̸̺͙̭̆̊̈́ḧ̶̟́̐e̸̱͔̟̓̓͝ ̶̨͔̾͛̑d̵̥̣̏ȧ̷̼̊r̷̰̝̥̅̌͝k̵̟̥̞̉̍͛
Stars: ✭ 19 (-42.42%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+46975.76%)
telegram-crawler🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
Stars: ✭ 84 (+154.55%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+6800%)
tech-seo-crawlerBuild a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Stars: ✭ 57 (+72.73%)
pumbaFetch, store and access user agent strings for different browsers
Stars: ✭ 12 (-63.64%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+1566.67%)
CrawlerGo process used to crawl websites
Stars: ✭ 147 (+345.45%)
Instagram BotAn Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (+318.18%)
Simple.Wpf.DataGridAn experiment to build a data grid (blotter) in WPF without using any third party libaries
Stars: ✭ 64 (+93.94%)
Bhban rpa6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.
Stars: ✭ 124 (+275.76%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+3263.64%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (+278.79%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+57.58%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+128212.12%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (+196.97%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-48.48%)
ArachnidPowerful web scraping framework for Crystal
Stars: ✭ 68 (+106.06%)
puppet-masterPuppeteer as a service hosted on Saasify.
Stars: ✭ 25 (-24.24%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+54.55%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+2290.91%)
pdf-crawlerSimFin's open source PDF crawler
Stars: ✭ 100 (+203.03%)
auctusDataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Stars: ✭ 34 (+3.03%)
BaiduSpider项目已经移动至:https://github.com/BaiduSpider/BaiduSpider !! 一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Stars: ✭ 29 (-12.12%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-48.48%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-36.36%)
the-seinfeld-chroniclesA dataset for textual analysis on arguably the best written comedy television show ever.
Stars: ✭ 14 (-57.58%)
ioSenderA GCode Sender for Grbl and grblHAL written in C# (Windows only).
Stars: ✭ 142 (+330.3%)