Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → luxcem → Apifier

luxcem / Apifier

Licence: lgpl-3.0

Apifier is a very simple HTML parser written in Python based on CSS selectors

Programming Languages

python

139335 projects - #7 most used programming language

Labels

html parse html-parser css-selector

Projects that are alternatives of or similar to Apifier

Html Agility Pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.

Stars: ✭ 2,014 (+40180%)

Mutual labels: parse, html-parser

Modest

Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.

Stars: ✭ 572 (+11340%)

Mutual labels: html-parser, css-selector

Skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

Stars: ✭ 231 (+4520%)

Mutual labels: parse, html-parser

Floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.

Stars: ✭ 1,642 (+32740%)

Mutual labels: html-parser, css-selector

modest ex

Elixir library to do pipeable transformations on html strings (with CSS selectors)

Stars: ✭ 31 (+520%)

Mutual labels: css-selector, html-parser

Nginxparser

Parses nginx configuration with Pyparsing — Used in Letsencrypt

Stars: ✭ 489 (+9680%)

Mutual labels: parse

Remarkable

Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by Facebook, Docusaurus and many others! Use https://github.com/breakdance/breakdance for HTML-to-markdown conversion. Use https://github.com/jonschlinkert/markdown-toc to generate a table of contents.

Stars: ✭ 5,252 (+104940%)

Mutual labels: parse

Pidgin

C#'s fastest parser combinator library

Stars: ✭ 469 (+9280%)

Mutual labels: parse

Fullstack Javascript

Source code for the Fullstack JavaScript book

Stars: ✭ 456 (+9020%)

Mutual labels: parse

Micromark

the smallest commonmark compliant markdown parser that exists; new basis for @unifiedjs (hundreds of projects w/ billions of downloads for dealing w/ content)

Stars: ✭ 793 (+15760%)

Mutual labels: parse

Surgeon

Declarative DOM extraction expression evaluator. 👨‍⚕️

Stars: ✭ 653 (+12960%)

Mutual labels: css-selector

Nom

Rust parser combinator framework

Stars: ✭ 5,987 (+119640%)

Mutual labels: parse

Schm

Composable schemas for JavaScript and Node.js

Stars: ✭ 498 (+9860%)

Mutual labels: parse

Yauaa

Yet Another UserAgent Analyzer

Stars: ✭ 472 (+9340%)

Mutual labels: parse

Leasot

Parse and output TODOs and FIXMEs from comments in your files

Stars: ✭ 729 (+14480%)

Mutual labels: parse

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+9180%)

Mutual labels: css-selector

Php

Parser for PHP written in Go

Stars: ✭ 516 (+10220%)

Mutual labels: parse

Unitsnet

Makes life working with units of measurement just a little bit better.

Stars: ✭ 641 (+12720%)

Mutual labels: parse

Html Parser

php html parser，类似与PHP Simple HTML DOM Parser，但是比它快好几倍

Stars: ✭ 510 (+10100%)

Mutual labels: html-parser

Parsepy

A relatively up-to-date fork of ParsePy, the Python wrapper for the Parse.com API. Originally maintained by @dgrtwo

Stars: ✭ 509 (+10080%)

Mutual labels: parse

View All Similar Projects ➔

Apifier

Apifier is a very simple HTML parser written in Python.

It aims to parse HTML documents in a declarative way using css or xpath selectors. Its main purpose is to parse tabular and/or paginated data.

Install

Apifier is available for python 3

pip install apifier

Example

Getting all comments from an article at "LeFigaro.fr"

from apifier import Apifier

config = {
    "name": "FigaroBot article comments",
    "encoding": "latin-1",
    "url": "http://www.lefigaro.fr/politique/le-scan/2016/07/21/25001-20160721ARTFIG00062-attentat-de-nice-la-droite-demande-une-enquete-independante.php",
    "foreach": "#fig-pagination-nav > li > a",
    "context": "page",
    "xpath": False,
    "prefix": "#reagir > div > div > div.fig-col.fig-col--comments > div:nth-child(3) > ul > li > article >",
    "description": {
        "author": "div.fig-comment-header a",
        "comment": "div.fig-comment-msg p"
    }
}

api = Apifier(config=config)
data = api.load()

Config

name : name of the current configuration
encoding : is the encoding the page is using, data will be converted from this encoding to utf-8 for sanity
url : page url, first page in case of paginated data
xpath: boolean, set to true if selectors are xpath instead of css
next : selector for a "next" link, apifier will crawl pages with next link until none is found

foreach : selector for the pagination links int this example pagination looks like :

<ul id="fig-pagination-nav">
  <li class="fig-pagination-current"><a href="…"> 1 </a></li>
  <li><a href="…"> 2 </a></li>
  <li><a href="…"> 3 </a></li>
</ul>

context : each data will be associated with a special variable named after the content of the pagination link in this case, this content is just the page number, but the pagination mechanism can be used for othher purpose like categories
prefix : descriptors will be prefixed by this option
description : descriptor for content to parse, in this example, comment content and author name.

To use xpath selector instead of css write them prefixed by a $.

The result is :

    data =
    [
        {'comment': "…", 'author': '…', 'page': '1'}, etc
    ]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 5

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗