All Projects → osener → Wring

osener / Wring

Licence: mit
Extract content from webpages using CSS Selectors, XPath, and JS expressions

Programming Languages

purescript
368 projects

Projects that are alternatives of or similar to Wring

Responsive mockups
Takes screenshots of a webpage in different resolutions and automatically applies it to mockup templates.
Stars: ✭ 274 (-40.69%)
Mutual labels:  phantomjs
Jsoupxpath
纯Java实现的支持W3C Xpath 1.0标准语法的HTML解析器。A html parser with xpath base on Jsoup and Antlr4. Maybe it is the best in java,ha ha.Just try it.
Stars: ✭ 331 (-28.35%)
Mutual labels:  xpath
Grunt Mocha
[MOVED] Grunt task for running mocha specs in a headless browser (PhantomJS)
Stars: ✭ 371 (-19.7%)
Mutual labels:  phantomjs
Slimerjs
A scriptable browser like PhantomJS, based on Firefox
Stars: ✭ 2,984 (+545.89%)
Mutual labels:  phantomjs
Crawlerforreader
Android 本地网络小说爬虫,基于jsoup及xpath
Stars: ✭ 312 (-32.47%)
Mutual labels:  xpath
Xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Stars: ✭ 335 (-27.49%)
Mutual labels:  xpath
spparser
an async ETL tool written in Python.
Stars: ✭ 34 (-92.64%)
Mutual labels:  xpath
Nightmare
A high-level browser automation library.
Stars: ✭ 19,067 (+4027.06%)
Mutual labels:  phantomjs
Fluentdom
A fluent api for working with XML in PHP
Stars: ✭ 327 (-29.22%)
Mutual labels:  xpath
Spider Flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Stars: ✭ 365 (-21%)
Mutual labels:  xpath
Exist
eXist Native XML Database and Application Platform
Stars: ✭ 294 (-36.36%)
Mutual labels:  xpath
Node Html Pdf
📄 Html to pdf converter in nodejs. It spawns a phantomjs process and passes the pdf as buffer or as filename.
Stars: ✭ 3,364 (+628.14%)
Mutual labels:  phantomjs
Phantomjs Node
PhantomJS integration module for NodeJS
Stars: ✭ 3,544 (+667.1%)
Mutual labels:  phantomjs
Phantomjs
Go client for PhantomJS.
Stars: ✭ 278 (-39.83%)
Mutual labels:  phantomjs
Xpath
XPath package for Golang, supports HTML, XML, JSON document query.
Stars: ✭ 376 (-18.61%)
Mutual labels:  xpath
node-qunit-phantomjs
Run QUnit unit tests in a headless PhantomJS instance without using Grunt
Stars: ✭ 36 (-92.21%)
Mutual labels:  phantomjs
Htmlquery
htmlquery is golang XPath package for HTML query.
Stars: ✭ 338 (-26.84%)
Mutual labels:  xpath
Camaro
camaro is an utility to transform XML to JSON, using Node.js binding to native XML parser pugixml, one of the fastest XML parser around.
Stars: ✭ 438 (-5.19%)
Mutual labels:  xpath
Browser Run
The easiest way of running code in a browser environment
Stars: ✭ 378 (-18.18%)
Mutual labels:  phantomjs
Comic Dl
Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
Stars: ✭ 365 (-21%)
Mutual labels:  phantomjs

wring

Installation

You can install wring using npm:

$ npm install --global wring

Wring utilizes PhantomJS for some of its commands. To use these, install it using your system package manager by running something like brew install phantomjs on OS X, or apt-get install phantomjs on Ubuntu. You can make sure it's on your PATH by running phantomjs -v.

Alternatively, you can install a version which automatically downloads PhantomJS binaries for your system:

$ npm install --global wring-with-phantomjs

Usage

wring text

Here is a simple example which prints contents of the matching element (uses Cheerio under the hood):

$ wring text 'https://www.google.com/finance/converter?a=1&from=EUR&to=USD' '#currency_converter_result'
1 EUR = 1.0940 USD

# You can use the first letter of command as a shortcut
$ wring t http://randomfunfacts.com i
No president of the United States was an only child.

You can also use jQuery specific selectors such as :contains():

$ wring t 'https://en.wikipedia.org/wiki/List_of_songs_recorded_by_Taylor_Swift' 'tr:contains("The Hunger Games") th:first-child'
"Eyes Open"
"Safe & Sound"

wring html

Prints outerHTMLof matching elements. Here is an example, this time using an XPath expression:

$ wring html "http://news.ycombinator.com" "//td[@class='title']/a[starts-with(@href,'http')]"
<a href="http://eftimov.net/postgresql-indexes-first-principles">PostgreSQL Indexes: First principles</a>
<a href="http://inference-review.com/article/doing-mathematics-differently">Doing Mathematics Differently</a>
<a href="https://blog.chartmogul.com/api-based-saas/">The rise of the API-based SaaS</a>
<a href="https://github.com/tallesl/Rich-Hickey-fanclub">Rich Hickey Fanclub</a>
...

Accepted inputs

First argument of a command specifies its input, which can be a URL, path to a file, HTML string, or - to read the page source from stdin:

# read from file
$ curl 'http://www.purescript.org/' > page.html
$ wring t page.html '.intro h2'
PureScript is a small strongly typed programming language that compiles to JavaScript.

# read from string
$ wring text '<div class="foo">Hello</div>' '.foo'
Hello

# read from stdin
$ curl -s 'http://www.merriam-webster.com/word-of-the-day' | wring text - '.word-and-pronunciation h1'
keelhaul

Using with PhantomJS

Prefixing a command with phantomjs or p will run it using jQuery inside a real web browser context. You can use this if you are having compatibility problems with the commands above, but the real utility comes from being able to scrape dynamically generated content:

$ wring p t '<title>Foo</title> <script>document.title = "Bar";</script>' 'title'
Bar

# compare it to the non-phantomjs invocation below
$ wring t '<title>Foo</title> <script>document.title = "Bar";</script>' 'title'
Foo

wring eval

Lets you evaluate JavaScript inside any page. Calling wring('str') will write to terminal. You can pass any number of .js file paths, URLs, and JS expressions as script arguments and they will get executed in given order:

$ wring eval 'http://ipfs.io' 'wring(document.title)'
IPFS is a new peer-to-peer hypermedia protocol.

# you can load and use third party libraries:
$ wring e 'http://ipfs.io' 'http://cdn.jsdelivr.net/lodash/4.5.1/lodash.js' 'wring(_.kebabCase(document.title))'
ipfs-is-a-new-peer-to-peer-hypermedia-protocol

Self contained scripts

You can also use a trick to make self contained scripts.

Here is a contrived example which loads Hacker News homepage, loads lodash, sorts posts by their score, and prints the top 5:

#!/bin/sh
":" //; exec wring eval "https://news.ycombinator.com" "https://cdn.jsdelivr.net/lodash/4.5.1/lodash.js" "$0"

var posts = _.map(
  document.querySelectorAll(".votelinks + .title > a"),
  function(el) {
    return el.textContent + "\n" + el.href;
  })

var scores = _.map(
  document.querySelectorAll(".score"),
  function (el) {
    return parseInt(el.textContent, 10);
  })

_(posts)
  .zipWith(scores, function (text, score) {
    return { text: text, score: score };
  })
  .orderBy("score", "desc")
  .take(5)
  .forEach(function (item) {
    wring(item.text + "\n");
  })
# after saving the source above to `wring_hn.js` you can run it like this
$ chmod +x wring_hn.js
$ ./wring_hn.js
Raspberry Pi 3 Model B confirmed, with onboard BT LE and WiFi
https://apps.fcc.gov/oetcf/eas/reports/...

After fifteen years of downtime, the MetaFilter gopher server is back
http://metatalk.metafilter.com/24019/...
...

wring shot

Last command to cover is wring shot, which renders a screenshot of first matching element and saves it to a file:

$ wring shot 'https://www.google.com/finance?q=GOOG' '#price-panel' goog.png
wring: Saved to goog.png

Resulting goog.png will contain something like this:

GOOG

Development

# Install Node.js dependencies:
$ npm install

# Install PureScript dependencies:
$ bower install

# Build `wring.js` and `phantom-main.js`:
$ npm run build

# Run tests:
$ npm test

# Compile & run using Pulp (https://github.com/bodil/pulp):
$ pulp run text '<b>foo</b>' 'b'

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].