Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+5850.94%)

Mutual labels: scraping, web-scraping

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+1241.51%)

Mutual labels: scraping, web-scraping

raspagem-de-dados-fatec

📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí

Stars: ✭ 22 (-58.49%)

Mutual labels: scraping, web-scraping

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+775.47%)

Mutual labels: scraping, web-scraping

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-71.7%)

Mutual labels: scraping, web-scraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (+47.17%)

Mutual labels: scraping, web-scraping

codechef-rank-comparator

Web application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).

Stars: ✭ 23 (-56.6%)

Mutual labels: web-scraping, xpath

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

Stars: ✭ 71 (+33.96%)

Mutual labels: scraping, web-scraping

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+179.25%)

Mutual labels: scraping, web-scraping

PythonScrapyBasicSetup

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

Stars: ✭ 57 (+7.55%)

Mutual labels: scraping, web-scraping

View All Similar Projects ➔

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Free software: MIT license
Documentation: https://selectorlib.readthedocs.io.

Example

>>> from selectorlib import Extractor
>>> yaml_string = """
    title:
        css: "h1"
        type: Text
    link:
        css: "h2 a"
        type: Link
    """
>>> extractor = Extractor.from_yaml_string(yaml_string)
>>> html = """
    <h1>Title</h1>
    <h2>Usage
        <a class="headerlink" href="http://test">¶</a>
    </h2>
    """
>>> extractor.extract(html)
{'title': 'Title', 'link': 'http://test'}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

scrapehero / selectorlib

Programming Languages

Labels

Projects that are alternatives of or similar to selectorlib

selectorlib

Example