diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+341.67%)
ServerServer written in PHP, provides a Javascript API for in the browser
Stars: ✭ 34 (+183.33%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (+83.33%)
Android SmartwebviewA webview integrated w/ native features to help create most advanced hybrid applications.
Stars: ✭ 357 (+2875%)
sawmillSawmill is a JSON transformation Java library
Stars: ✭ 92 (+666.67%)
httpAplus Framework HTTP Library
Stars: ✭ 113 (+841.67%)
the-seinfeld-chroniclesA dataset for textual analysis on arguably the best written comedy television show ever.
Stars: ✭ 14 (+16.67%)
userAgentListsGet your lists of User-Agent Strings here
Stars: ✭ 57 (+375%)
Device Detector JsA precise user agent parser and device detector written in TypeScript
Stars: ✭ 193 (+1508.33%)
Cdp4jcdp4j - Chrome DevTools Protocol for Java
Stars: ✭ 232 (+1833.33%)
uach-retrofillThis snippet illustrates how to reconstruct the legacy navigator.userAgent string value from the modern navigator.userAgentData values.
Stars: ✭ 26 (+116.67%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+1550%)
Device DetectorThe Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
Stars: ✭ 2,106 (+17450%)
N2h4네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (+1375%)
useragent-generatorEasily generate correct user-agent strings for popular browsers
Stars: ✭ 62 (+416.67%)
Holiday Cn📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (+1208.33%)
Devicedetector.netThe Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
Stars: ✭ 144 (+1100%)
MassivedlDownload a large list of files concurrently
Stars: ✭ 141 (+1075%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (+108.33%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+96108.33%)
Parser PhpBrowser sniffing gone too far — A useragent parser library for PHP
Stars: ✭ 1,626 (+13450%)
CorpuscrawlerCrawler for linguistic corpora
Stars: ✭ 127 (+958.33%)
tech-seo-crawlerBuild a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Stars: ✭ 57 (+375%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+14300%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+733.33%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+325%)
Dig Etl EngineDownload DIG to run on your laptop or server.
Stars: ✭ 81 (+575%)
Useragent.jsA User-agent analyze project.
Stars: ✭ 70 (+483.33%)
jsgraphDeprecated: Use the @encapsule/arccore package that includes the graph library
Stars: ✭ 42 (+250%)
Pdf downloaderA Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (+50%)
Parser JavascriptBrowser sniffing gone too far — A useragent parser library for JavaScript
Stars: ✭ 66 (+450%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+5208.33%)
VytalBrowser extension to spoof timezone, geolocation, locale and user agent.
Stars: ✭ 1,449 (+11975%)
Bugz🐛 Composable User Agent Detection using Ramda
Stars: ✭ 15 (+25%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+40208.33%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+9150%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+3566.67%)
KolpaA fake data generator written in and for Go
Stars: ✭ 645 (+5275%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+2933.33%)
telegram-crawler🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
Stars: ✭ 84 (+600%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+2283.33%)
User agentHTTP User Agent parser for the Go programming language.
Stars: ✭ 578 (+4716.67%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+2208.33%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (+2041.67%)
User AgentsA JavaScript library for generating random user agents with data that's updated daily.
Stars: ✭ 485 (+3941.67%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+466.67%)
crawlerdetectGolang module to detect bots and crawlers via the user agent
Stars: ✭ 22 (+83.33%)
CurlA command line tool and library for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, GOPHER, GOPHERS, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, MQTT, POP3, POP3S, RTMP, RTMPS, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET and TFTP. libcurl offers a myriad of powerful features
Stars: ✭ 22,875 (+190525%)
haroHaro is a modern immutable DataStore
Stars: ✭ 24 (+100%)
react-ua📱React User Agent Component, Hook, and HOC. SSR-ready, full UT, using new React Context and Hooks API
Stars: ✭ 18 (+50%)
user-agentUser-Agent parser for Clojure
Stars: ✭ 24 (+100%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (+41.67%)
pdf-crawlerSimFin's open source PDF crawler
Stars: ✭ 100 (+733.33%)
Browscap📃 The main project repository
Stars: ✭ 354 (+2850%)