All Projects → olamedia → Nokogiri

olamedia / Nokogiri

Licence: mit
HTML parser for PHP - Парсер HTML

Projects that are alternatives of or similar to Nokogiri

Htmlquery
htmlquery is golang XPath package for HTML query.
Stars: ✭ 338 (+57.94%)
Mutual labels:  xpath, html-parser
Html Agility Pack
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
Stars: ✭ 2,014 (+841.12%)
Mutual labels:  xpath, html-parser
Harser
Easy way for HTML parsing and building XPath
Stars: ✭ 135 (-36.92%)
Mutual labels:  xpath, html-parser
Fuzi
A fast & lightweight XML & HTML parser in Swift with XPath & CSS support
Stars: ✭ 894 (+317.76%)
Mutual labels:  xpath, html-parser
Jsoupxpath
纯Java实现的支持W3C Xpath 1.0标准语法的HTML解析器。A html parser with xpath base on Jsoup and Antlr4. Maybe it is the best in java,ha ha.Just try it.
Stars: ✭ 331 (+54.67%)
Mutual labels:  xpath, html-parser
Didom
Simple and fast HTML and XML parser
Stars: ✭ 1,939 (+806.07%)
Mutual labels:  html-parser, xpath
Autocser
AutoCSer is a high-performance RPC framework. AutoCSer 是一个以高效率为目标向导的整体开发框架。主要包括 TCP 接口服务框架、TCP 函数服务框架、远程表达式链组件、前后端一体 WEB 视图框架、ORM 内存索引缓存框架、日志流内存数据库缓存组件、消息队列组件、二进制 / JSON / XML 数据序列化 等一系列无缝集成的高性能组件。
Stars: ✭ 140 (-34.58%)
Mutual labels:  html-parser
Xquery
Extract data or evaluate value from HTML/XML documents using XPath
Stars: ✭ 155 (-27.57%)
Mutual labels:  xpath
Wxparse
微信小程序富文本解析
Stars: ✭ 135 (-36.92%)
Mutual labels:  html-parser
Unhtml.rs
A magic html parser
Stars: ✭ 180 (-15.89%)
Mutual labels:  html-parser
Jsonquery
jsonq package for Go. Golang XPath query for JSON query.
Stars: ✭ 134 (-37.38%)
Mutual labels:  xpath
Docs
《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Stars: ✭ 118 (-44.86%)
Mutual labels:  xpath
Cssplus
CSSplus is a collection of CSS Reprocessor plugins that dynamically update CSS variables
Stars: ✭ 141 (-34.11%)
Mutual labels:  xpath
Pywebcopy
Python library to mirror webpage and websites.
Stars: ✭ 156 (-27.1%)
Mutual labels:  html-parser
Zson
专为测试人员打造的JSON解析器
Stars: ✭ 181 (-15.42%)
Mutual labels:  xpath
Xmlquery
xmlquery is Golang XPath package for XML query.
Stars: ✭ 209 (-2.34%)
Mutual labels:  xpath
Minimize
Minimize HTML
Stars: ✭ 150 (-29.91%)
Mutual labels:  html-parser
Jquery Xpath
jQuery XPath plugin (with full XPath 2.0 language support)
Stars: ✭ 173 (-19.16%)
Mutual labels:  xpath
Goxpath
An XPath 1.0 implementation written in the Go programming language.
Stars: ✭ 148 (-30.84%)
Mutual labels:  xpath
Xsltdev.ru
Справочник web-разработчика с примерами
Stars: ✭ 148 (-30.84%)
Mutual labels:  xpath

PHP Composer

Attention: New version can break compatibility, in that case use previous version under the v1.0 branch or tag which supports even php 5.4+

\nokogiri class is left for compatibility

In English На русском

HTML parser

This library is a fast HTML parser, which can work with invalid code (errors are ignored).
Under the hood is used LibXML.
As the input you can use HTML string in UTF-8 encoding or DOMDocument.
For the querying elements CSS selectors are used, which are transformed to XPath expressions internally.

Usage

Loading HTML

HTML errors are ignored

  • From HTML string $saw = new \nokogiri($html); $saw = \nokogiri::fromHtml($html);
  • From DOM elements $saw = new \nokogiri($dom); $saw = \nokogiri::fromDom($dom);

get($cssSelector)

$cssSelector elements have the following format: tagName[attribute=value]#elementId.className:pseudoSelector(expression)

$saw->get('div > a[rel=bookmark]')->toArray();

toArray()

Returns underlying DOM structure as an array.
Values are attributes, text content under #text key and child elements under numeric keys

toXml()

Returns HTML string

getDom() toDom()

Returns DOMDocument. Given true as the first argument - can also return DOMNodeList or DOMElement

Iteration over found elements

foreach ($saw->get('#sidebar a.topic') as $link){
    var_dump($link['#text']);
}

Implemented selectors

  • tag
  • .class
  • #id
  • [attr]
  • [attr=value]
  • :root
  • :empty
  • :first-child
  • :last-child
  • :first-of-type
  • :last-of-type
  • :only-of-type
  • :nth-child(a)
  • :nth-child(an+b)
  • :nth-child(even/odd)

Requirements

  • DOM
  • libxml >=2.9.0
  • PHP >= 7.3

License

MIT

What's new

2.0.0

  • Minimal PHP version 7.3
  • Minimal LibXML version 2.9.0
  • Complete refactoring
  • Partially changed behaviour, can break compatibility
  • HTML loading behaviour changed
  • Test coverage
  • Fixed work of nth-child and other selectors
  • Incorrect selectors now throw exceptions
  • New selectors added

1.0.0

  • First version, 2011
  • Minimal PHP version 5.4
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].