All Projects → nemo → Scrape

nemo / Scrape

Distributed Scraper

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Scrape

MangDL
The most inefficient Manga downloader for PC
Stars: ✭ 40 (-38.46%)
Mutual labels:  metadata, scraper
Image search
Python Library to download images and metadata from popular search engines.
Stars: ✭ 86 (+32.31%)
Mutual labels:  scraper, metadata
Unfurl
Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based ⚡️
Stars: ✭ 193 (+196.92%)
Mutual labels:  scraper, metadata
tinyPornManager
Made for pornhub. Fork from tinyMediaManager v3
Stars: ✭ 57 (-12.31%)
Mutual labels:  metadata, scraper
YouTube-MA
💾 YouTube video metadata archiver written in Golang
Stars: ✭ 17 (-73.85%)
Mutual labels:  metadata, scraper
oge
Page metadata as a service
Stars: ✭ 22 (-66.15%)
Mutual labels:  metadata, scraper
unfurl
Extract rich metadata from URLs
Stars: ✭ 41 (-36.92%)
Mutual labels:  metadata, scraper
Emby.plugins.javscraper
Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。
Stars: ✭ 864 (+1229.23%)
Mutual labels:  scraper, metadata
Sqlservermetadata
SQL Server Metadata Toolkit
Stars: ✭ 47 (-27.69%)
Mutual labels:  metadata
Uc Guidelines
To improve the clarity and usefulness of finding aids and to promote consistency across campuses, a working group of digital archivists under the aegis of the UC Born-Digital Content Common Knowledge Group (CKG) have collaborated to develop a UC-wide descriptive standard for born-digital archival material.
Stars: ✭ 54 (-16.92%)
Mutual labels:  metadata
Labelled
Manipulating labelled vectors in R
Stars: ✭ 45 (-30.77%)
Mutual labels:  metadata
Scrapstagram
An Instagram Scrapper
Stars: ✭ 50 (-23.08%)
Mutual labels:  scraper
Hlsinjector
ID3 metadata injector for MPEG TS (HLS) written in PHP
Stars: ✭ 56 (-13.85%)
Mutual labels:  metadata
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-27.69%)
Mutual labels:  scraper
Schema Microdata Examples
Some examples of HTML markup using Schema.org microdata
Stars: ✭ 58 (-10.77%)
Mutual labels:  metadata
Karate
Webscraper
Stars: ✭ 45 (-30.77%)
Mutual labels:  scraper
Repository.kodibae
Kodi Bae Repository - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]
Stars: ✭ 45 (-30.77%)
Mutual labels:  scraper
Metaforge
An OSINT Metadata analyzing tool that filters through tags and creates reports
Stars: ✭ 63 (-3.08%)
Mutual labels:  metadata
Ipdata
🌐 An IP lookup system utilizing open datasets
Stars: ✭ 58 (-10.77%)
Mutual labels:  metadata
Framework
IONDV. Framework is a high level framework for enterprise web applications development.
Stars: ✭ 54 (-16.92%)
Mutual labels:  metadata

Distributed Scraper

stdlib service

This is a scraper function that automatically pulls in metadata from the page, as well as supports simple HTML querying using cheerio.

It's built on top of stdlib which makes it highly distributed and scalable.

Usage

You can either use the ready service that's deployed on stdlib here, or fork this repository and launch your own version on stdlib.

Example

For example, a simple scrape to pick up my own email address from Github (and a bunch of extra metadata):

lib nemo.scrape --url https://github.com/nemo --query "li[itemprop='email'] a"
{ metadata:
   { general:
      { description: 'nemo has 36 repositories available. Follow their code on GitHub.',
        title: 'nemo (Nima Gardideh) · GitHub',
        lang: 'en' },
     openGraph:
      { app_id: '1401488693436528',
        image: [Object],
        site_name: 'GitHub',
        type: 'profile',
        title: 'nemo (Nima Gardideh)',
        url: 'https://github.com/nemo',
        description: 'nemo has 36 repositories available. Follow their code on GitHub.',
        username: 'nemo' },
     schemaOrg: { items: [Object] },
     twitter:
      { image: [Object],
        site: '@github',
        card: 'summary',
        title: 'nemo (Nima Gardideh)',
        description: 'nemo has 36 repositories available. Follow their code on GitHub.' } },
  url: 'https://github.com/nemo',
  query: 'li[itemprop=\'email\'] a',
  query_value: '[email protected]'
}

You can view the function specification here.

Notes

Note that this scraper does not support sites that are single page Javascript applications. You should also follow robot.txt rules when you're scraping websites. Use responsibly.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].