Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → antchfx → Htmlquery

antchfx / Htmlquery

Licence: mit

htmlquery is golang XPath package for HTML query.

Programming Languages

31211 projects - #10 most used programming language

3204 projects

Labels

html xpath html-parser

Projects that are alternatives of or similar to Htmlquery

Simple and fast HTML and XML parser

Stars: ✭ 1,939 (+473.67%)

Mutual labels: html-parser, xpath

Html Agility Pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.

Stars: ✭ 2,014 (+495.86%)

Mutual labels: xpath, html-parser

A fast & lightweight XML & HTML parser in Swift with XPath & CSS support

Stars: ✭ 894 (+164.5%)

Mutual labels: xpath, html-parser

HTML parser for PHP - Парсер HTML

Stars: ✭ 214 (-36.69%)

Mutual labels: xpath, html-parser

Easy way for HTML parsing and building XPath

Stars: ✭ 135 (-60.06%)

Mutual labels: xpath, html-parser

纯Java实现的支持W3C Xpath 1.0标准语法的HTML解析器。A html parser with xpath base on Jsoup and Antlr4. Maybe it is the best in java,ha ha.Just try it.

Stars: ✭ 331 (-2.07%)

Mutual labels: xpath, html-parser

Simple browser engine.

Stars: ✭ 31 (-90.83%)

Mutual labels: html-parser

🌀 parse and convert html string to anything

Stars: ✭ 43 (-87.28%)

Mutual labels: html-parser

一些爬虫开发的技巧和案例

Stars: ✭ 33 (-90.24%)

Mutual labels: xpath

豆瓣租房小组多线程爬虫。爬取后自动按时间排序生成markdown文件。

Stars: ✭ 31 (-90.83%)

Mutual labels: xpath

A fluent api for working with XML in PHP

Stars: ✭ 327 (-3.25%)

Mutual labels: xpath

eXist Native XML Database and Application Platform

Stars: ✭ 294 (-13.02%)

Mutual labels: xpath

Fetch data from HTML and XML via xpath/css and prepare it with regexp

Stars: ✭ 29 (-91.42%)

Mutual labels: xpath

web-data-extractor

Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.

Stars: ✭ 52 (-84.62%)

Mutual labels: xpath

an async ETL tool written in Python.

Stars: ✭ 34 (-89.94%)

Mutual labels: xpath

codechef-rank-comparator

Web application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).

Stars: ✭ 23 (-93.2%)

Mutual labels: xpath

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

Stars: ✭ 295 (-12.72%)

Mutual labels: html-parser

Temario y ejercicios de Desarrollo de Aplicaciones Multiplataforma (DAM)

Stars: ✭ 96 (-71.6%)

Mutual labels: xpath

A Visual Studio Extension which can run any XPath and XPath function; navigates through results at the click of a button. Can show and copy any XPath incl. XML namespaces, avoiding XML namespace induced headaches. Keeps track of the current XPath via the statusbar.

Stars: ✭ 40 (-88.17%)

Mutual labels: xpath

The fast & forgiving HTML and XML parser

Stars: ✭ 3,299 (+876.04%)

Mutual labels: html-parser

View All Similar Projects ➔

htmlquery

Overview

htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

htmlquery built-in the query object caching feature based on LRU, this feature will caching the recently used XPATH query string. Enable query caching can avoid re-compile XPath expression each query.

Installation

go get github.com/antchfx/htmlquery

Getting Started

Query, returns matched elements or error.

nodes, err := htmlquery.QueryAll(doc, "//a")
if err != nil {
	panic(`not a valid XPath expression.`)
}

Load HTML document from URL.

doc, err := htmlquery.LoadURL("http://example.com/")

Load HTML from document.

filePath := "/home/user/sample.html"
doc, err := htmlquery.LoadDoc(filePath)

Load HTML document from string.

s := `<html>....</html>`
doc, err := htmlquery.Parse(strings.NewReader(s))

Find all A elements.

list := htmlquery.Find(doc, "//a")

Find all A elements that have `href` attribute.

list := htmlquery.Find(doc, "//a[@href]")

Find all A elements with `href` attribute and only return `href` value.

list := htmlquery.Find(doc, "//a/@href")	
for _ , n := range list{
	fmt.Println(htmlquery.SelectAttr(n, "href")) // output @href value
}

Find the third A element.

a := htmlquery.FindOne(doc, "//a[3]")

Find children element (img) under A `href` and print the source

a := htmlquery.FindOne(doc, "//a")
img := htmlquery.FindOne(a, "//img")
fmt.Prinln(htmlquery.SelectAttr(img, "src")) // output @src value

Evaluate the number of all IMG element.

expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)

FAQ

`Find()` vs `QueryAll()`, which is better?

Find and QueryAll both do the same things, searches all of matched html nodes. The Find will panics if you give an error XPath query, but QueryAll will return an error for you.

Can I save my query expression object for the next query?

Yes, you can. We offer the QuerySelector and QuerySelectorAll methods, It will accept your query expression object.

Cache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.

XPath query object cache performance

goos: windows
goarch: amd64
pkg: github.com/antchfx/htmlquery
BenchmarkSelectorCache-4                20000000                55.2 ns/op
BenchmarkDisableSelectorCache-4           500000              3162 ns/op

How to disable caching?

htmlquery.DisableSelectorCache = true

Changelogs

2019-11-19

Add built-in query object cache feature, avoid re-compilation for the same query string. #16
Added LoadDoc 18

2019-10-05

Add new methods that compatible with invalid XPath expression error: QueryAll and Query.
Add QuerySelector and QuerySelectorAll methods, supported reused your query object.

2019-02-04

#7 Removed deprecated FindEach() and FindEachWithBreak() methods.

2018-12-28

Avoid adding duplicate elements to list for Find() method. #6

Tutorial

func main() {
	doc, err := htmlquery.LoadURL("https://www.bing.com/search?q=golang")
	if err != nil {
		panic(err)
	}
	// Find all news item.
	list, err := htmlquery.QueryAll(doc, "//ol/li")
	if err != nil {
		panic(err)
	}
	for i, n := range list {
		a := htmlquery.FindOne(n, "//a")
		fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"))
	}
}

List of supported XPath query packages

Name	Description
htmlquery	XPath query package for the HTML document
xmlquery	XPath query package for the XML document
jsonquery	XPath query package for the JSON document

Questions

Please let me know if you have any questions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 338

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗