All Projects → lexborisov → Myhtml

lexborisov / Myhtml

Licence: lgpl-2.1
Fast C/C++ HTML 5 Parser. Using threads.

Programming Languages

c
50402 projects - #5 most used programming language
objective c
16641 projects - #2 most used programming language

Projects that are alternatives of or similar to Myhtml

Htmlparser2
The fast & forgiving HTML and XML parser
Stars: ✭ 3,299 (+118.19%)
Mutual labels:  html-parser
Apifier
Apifier is a very simple HTML parser written in Python based on CSS selectors
Stars: ✭ 5 (-99.67%)
Mutual labels:  html-parser
Oga
Read-only mirror of https://gitlab.com/yorickpeterse/oga
Stars: ✭ 1,147 (-24.14%)
Mutual labels:  html-parser
Jsoupxpath
纯Java实现的支持W3C Xpath 1.0标准语法的HTML解析器。A html parser with xpath base on Jsoup and Antlr4. Maybe it is the best in java,ha ha.Just try it.
Stars: ✭ 331 (-78.11%)
Mutual labels:  html-parser
Html Parser
php html parser,类似与PHP Simple HTML DOM Parser,但是比它快好几倍
Stars: ✭ 510 (-66.27%)
Mutual labels:  html-parser
Htmlagilitypack.netcore
An agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. Deprecated as there's new maintainer for original HAP project.
Stars: ✭ 31 (-97.95%)
Mutual labels:  html-parser
html2any
🌀 parse and convert html string to anything
Stars: ✭ 43 (-97.16%)
Mutual labels:  html-parser
Wxmlify
一个轻量快速的插件,帮助你在微信小程序中显示富文本编辑器生成的HTML。
Stars: ✭ 93 (-93.85%)
Mutual labels:  html-parser
Modest
Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
Stars: ✭ 572 (-62.17%)
Mutual labels:  html-parser
Flutter html
A Flutter widget for rendering static html as Flutter widgets (Will render over 80 different html tags!)
Stars: ✭ 1,046 (-30.82%)
Mutual labels:  html-parser
Htmlquery
htmlquery is golang XPath package for HTML query.
Stars: ✭ 338 (-77.65%)
Mutual labels:  html-parser
Justext
Heuristic based boilerplate removal tool
Stars: ✭ 418 (-72.35%)
Mutual labels:  html-parser
Clojure Soup
Clojurized access for Jsoup.
Stars: ✭ 38 (-97.49%)
Mutual labels:  html-parser
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Stars: ✭ 295 (-80.49%)
Mutual labels:  html-parser
Hyntax
Straightforward HTML parser for JavaScript
Stars: ✭ 84 (-94.44%)
Mutual labels:  html-parser
modest ex
Elixir library to do pipeable transformations on html strings (with CSS selectors)
Stars: ✭ 31 (-97.95%)
Mutual labels:  html-parser
Fuzi
A fast & lightweight XML & HTML parser in Swift with XPath & CSS support
Stars: ✭ 894 (-40.87%)
Mutual labels:  html-parser
Floki
Floki is a simple HTML parser that enables search for nodes using CSS selectors.
Stars: ✭ 1,642 (+8.6%)
Mutual labels:  html-parser
Sax Wasm
The first streamable, fixed memory XML, HTML, and JSX parser for WebAssembly.
Stars: ✭ 89 (-94.11%)
Mutual labels:  html-parser
Marigold.openxhtml
MariGold.OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents.
Stars: ✭ 44 (-97.09%)
Mutual labels:  html-parser

MyHTML — a pure C HTML parser

Build Status

MyHTML is a fast HTML Parser using Threads implemented as a pure C99 library with no outside dependencies.

Now

Important announcement!

Please use the HTML parser from the Lexbor project. It is stable, has more features, and — yes — it's very fast.

Features

  • Asynchronous Parsing, Build Tree and Indexation
  • Fully conformant with the HTML5 specification
  • Two API - high and low-level
  • Manipulation of elements: add, change, delete and other
  • Manipulation of elements attributes: add, change, delete and other
  • Support 39 character encoding by specification encoding.spec.whatwg.org
  • Support detecting character encodings
  • Support Single Mode parsing
  • Support Build without POSIX Threads
  • Support for fragment parsing
  • Support for parsing by chunks
  • No outside dependencies
  • C99 support
  • Passes all tree construction tests from html5lib-tests
  • Tested by 1 billion HTML pages (by commoncrawl.org)

Changes

Please, see CHANGELOG.md file

Further developments

  • Modest — Modest is a fast HTML Render implemented as a pure C99 library with no outside dependencies
  • MyCSS — Fast C/C++ CSS Parser (Cascading Style Sheets Parser)

Support encodings for InputStream

X_USER_DEFINED, UTF_8, UTF_16LE, UTF_16BE, BIG5, EUC_KR, GB18030,
IBM866, ISO_8859_10, ISO_8859_13, ISO_8859_14, ISO_8859_15, ISO_8859_16, ISO_8859_2, ISO_8859_3,
ISO_8859_4, ISO_8859_5, ISO_8859_6, ISO_8859_7, ISO_8859_8, KOI8_R, KOI8_U, MACINTOSH,
WINDOWS_1250, WINDOWS_1251, WINDOWS_1252, WINDOWS_1253, WINDOWS_1254, WINDOWS_1255, WINDOWS_1256,
WINDOWS_1257, WINDOWS_1258, WINDOWS_874, X_MAC_CYRILLIC, ISO_2022_JP, GBK, SHIFT_JIS, EUC_JP, ISO_8859_8_I

Support encodings for output

Program working in UTF-8 and returns all in UTF-8

Detecting character encodings

Now it UTF-8, UTF-16LE, UTF16BE and russian windows-1251, koi8-r, iso-8859-5, x-mac-cyrillic, ibm866

Installation

See INSTALL.md

Introduction

Introduction

Benchmark

Dependencies

None

External Bindings and Wrappers

Examples

See examples directory

Simple example

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <myhtml/api.h>

int main(int argc, const char * argv[])
{
    char html[] = "<div><span>HTML</span></div>";
    
    // basic init
    myhtml_t* myhtml = myhtml_create();
    myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);
    
    // first tree init
    myhtml_tree_t* tree = myhtml_tree_create();
    myhtml_tree_init(tree, myhtml);
    
    // parse html
    myhtml_parse(tree, MyENCODING_UTF_8, html, strlen(html));
    
    // print result
    // or see serialization function with callback: myhtml_serialization_tree_callback
    mycore_string_raw_t str = {0};
    myhtml_serialization_tree_buffer(myhtml_tree_get_document(tree), &str);
    printf("%s\n", str.data);
    
    // release resources
    mycore_string_raw_destroy(&str, false);
    myhtml_tree_destroy(tree);
    myhtml_destroy(myhtml);
    
    return 0;
}

AUTHOR

Alexander Borisov [email protected]

COPYRIGHT AND LICENSE

Copyright (C) 2015-2018 Alexander Borisov

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

See the LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].