nemo / Scrape
Distributed Scraper
Stars: ✭ 65
Programming Languages
javascript
184084 projects - #8 most used programming language
Projects that are alternatives of or similar to Scrape
MangDL
The most inefficient Manga downloader for PC
Stars: ✭ 40 (-38.46%)
Mutual labels: metadata, scraper
Image search
Python Library to download images and metadata from popular search engines.
Stars: ✭ 86 (+32.31%)
Mutual labels: scraper, metadata
Unfurl
Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based ⚡️
Stars: ✭ 193 (+196.92%)
Mutual labels: scraper, metadata
tinyPornManager
Made for pornhub. Fork from tinyMediaManager v3
Stars: ✭ 57 (-12.31%)
Mutual labels: metadata, scraper
YouTube-MA
💾 YouTube video metadata archiver written in Golang
Stars: ✭ 17 (-73.85%)
Mutual labels: metadata, scraper
Emby.plugins.javscraper
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
Stars: ✭ 864 (+1229.23%)
Mutual labels: scraper, metadata
Uc Guidelines
To improve the clarity and usefulness of finding aids and to promote consistency across campuses, a working group of digital archivists under the aegis of the UC Born-Digital Content Common Knowledge Group (CKG) have collaborated to develop a UC-wide descriptive standard for born-digital archival material.
Stars: ✭ 54 (-16.92%)
Mutual labels: metadata
Hlsinjector
ID3 metadata injector for MPEG TS (HLS) written in PHP
Stars: ✭ 56 (-13.85%)
Mutual labels: metadata
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-27.69%)
Mutual labels: scraper
Schema Microdata Examples
Some examples of HTML markup using Schema.org microdata
Stars: ✭ 58 (-10.77%)
Mutual labels: metadata
Repository.kodibae
Kodi Bae Repository - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]
Stars: ✭ 45 (-30.77%)
Mutual labels: scraper
Metaforge
An OSINT Metadata analyzing tool that filters through tags and creates reports
Stars: ✭ 63 (-3.08%)
Mutual labels: metadata
Framework
IONDV. Framework is a high level framework for enterprise web applications development.
Stars: ✭ 54 (-16.92%)
Mutual labels: metadata
Distributed Scraper
This is a scraper function that automatically pulls in metadata from the page, as well as supports simple HTML querying using cheerio.
It's built on top of stdlib which makes it highly distributed and scalable.
Usage
You can either use the ready service that's deployed on stdlib here, or fork this repository and launch your own version on stdlib.
Example
For example, a simple scrape to pick up my own email address from Github (and a bunch of extra metadata):
lib nemo.scrape --url https://github.com/nemo --query "li[itemprop='email'] a"
{ metadata:
{ general:
{ description: 'nemo has 36 repositories available. Follow their code on GitHub.',
title: 'nemo (Nima Gardideh) · GitHub',
lang: 'en' },
openGraph:
{ app_id: '1401488693436528',
image: [Object],
site_name: 'GitHub',
type: 'profile',
title: 'nemo (Nima Gardideh)',
url: 'https://github.com/nemo',
description: 'nemo has 36 repositories available. Follow their code on GitHub.',
username: 'nemo' },
schemaOrg: { items: [Object] },
twitter:
{ image: [Object],
site: '@github',
card: 'summary',
title: 'nemo (Nima Gardideh)',
description: 'nemo has 36 repositories available. Follow their code on GitHub.' } },
url: 'https://github.com/nemo',
query: 'li[itemprop=\'email\'] a',
query_value: '[email protected]'
}
You can view the function specification here.
Notes
Note that this scraper does not support sites that are single page Javascript applications. You should also follow robot.txt rules when you're scraping websites. Use responsibly.
License
MIT
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].