All Projects β†’ luin β†’ Readability

luin / Readability

πŸ“š Turn any web page into a clean view

Programming Languages

HTML
75241 projects
javascript
184084 projects - #8 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Readability

Code Review Tips
πŸ”¬ Common problems to look for in a code review
Stars: ✭ 861 (-62.25%)
Mutual labels:  readability
Readability2
Readability2 converts HTML to plain text.
Stars: ✭ 100 (-95.62%)
Mutual labels:  readability
Reading List Mover
A Python utility for moving bookmarks/reading lists between services
Stars: ✭ 166 (-92.72%)
Mutual labels:  readability
Opendyslexic Chrome
Offical OpenDyslexic chrome extension
Stars: ✭ 36 (-98.42%)
Mutual labels:  readability
Reader
Extract clean(er), readable text from web pages via Mercury Web Parser.
Stars: ✭ 75 (-96.71%)
Mutual labels:  readability
Mercury fulltext
πŸ“– Enjoy full text for tt-rss.
Stars: ✭ 123 (-94.61%)
Mutual labels:  readability
Textstat
πŸ“ python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
Stars: ✭ 590 (-74.13%)
Mutual labels:  readability
Article Parser
To extract main article from given URL with Node.js
Stars: ✭ 179 (-92.15%)
Mutual labels:  readability
Sspipe
Simple Smart Pipe: python productivity-tool for rapid data manipulation
Stars: ✭ 96 (-95.79%)
Mutual labels:  readability
Readability
visualise readability
Stars: ✭ 160 (-92.99%)
Mutual labels:  readability
Pdfsave
Convert websites into readable PDFs
Stars: ✭ 46 (-97.98%)
Mutual labels:  readability
General News Extractor Js
πŸ€”δΈ€δΈͺζ–°ι—»η½‘ι‘΅ζ­£ζ–‡ι€šη”¨ζŠ½ε–ε™¨οΌŒεŒ…ζ‹¬ζ ‡ι’˜γ€δ½œθ€…ε’Œζ—₯ζœŸγ€‚
Stars: ✭ 55 (-97.59%)
Mutual labels:  readability
Php Readability
A fork of https://bitbucket.org/fivefilters/php-readability
Stars: ✭ 127 (-94.43%)
Mutual labels:  readability
Just Read
A customizable read mode web extension.
Stars: ✭ 874 (-61.68%)
Mutual labels:  readability
Newspaper
Read webpages in readability mode, inside your terminal.
Stars: ✭ 168 (-92.63%)
Mutual labels:  readability
Stylebot
Change the appearance of the web instantly
Stars: ✭ 746 (-67.3%)
Mutual labels:  readability
Orchestra
One language to be RegExp's Successor. Visually readable and rich, technically safe and extended, naturally scalable, advanced, and optimized
Stars: ✭ 103 (-95.48%)
Mutual labels:  readability
Readability
Readability is Elixir library for extracting and curating articles.
Stars: ✭ 188 (-91.76%)
Mutual labels:  readability
Cadmium
Natural Language Processing (NLP) library for Crystal
Stars: ✭ 172 (-92.46%)
Mutual labels:  readability
Py Readability Metrics
πŸ“— Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more
Stars: ✭ 132 (-94.21%)
Mutual labels:  readability

Readability

Turn any web page into a clean view. This module is based on arc90's readability project.

Features

  1. Optimized for more websites.
  2. Supporting HTML5 tags (article, section) and Microdata API.
  3. Focusing on both accuracy and performance. 4x times faster than arc90's version.
  4. Supporting encodings such as GBK and GB2312.
  5. Converting relative urls to absolute for images and links automatically (Thank Guillermo Baigorria & Tom Sutton).

Example

Before -> After

Install

$ npm install node-readability

Note that from v2.0.0, this module only works with Node.js >= 2.0. In the meantime you are still welcome to install a release in the 1.x series (by npm install node-readability@1) if you use an older Node.js version.

Usage

read(html [, options], callback)

Where

  • html url or html code.
  • options is an optional options object
  • callback is the callback to run - callback(error, article, meta)

Example

var read = require('node-readability');

read('http://howtonode.org/really-simple-file-uploads', function(err, article, meta) {
  // Main Article
  console.log(article.content);
  // Title
  console.log(article.title);

  // HTML Source Code
  console.log(article.html);
  // DOM
  console.log(article.document);

  // Response Object from Request Lib
  console.log(meta);

  // Close article to clean up jsdom and prevent leaks
  article.close();
});

NB If the page has been marked with charset other than utf-8, it will be converted automatically. Charsets such as GBK, GB2312 is also supported.

Options

node-readability will pass the options to request directly. See request lib to view all available options.

node-readability has two additional options:

  • cleanRulers which allow set your own validation rule for tags.

If true rule is valid, otherwise no. options.cleanRulers = [callback(obj, tagName)]

read(url, {
  cleanRulers: [
    function(obj, tag) {
      if(tag === 'object') {
        if(obj.getAttribute('class') === 'BrightcoveExperience') {
          return true;
        }
      }
    }
  ]}, function(err, article, response) {
    //...
  });
  • preprocess which should be a function to check or modify downloaded source before passing it to readability.

options.preprocess = callback(source, response, contentType, callback);

read(url, {
    preprocess: function(source, response, contentType, callback) {
      if (source.length > maxBodySize) {
        return callback(new Error('too big'));
      }
      callback(null, source);
    }
  }, function(err, article, response) {
    //...
  });

article object

content

The article content of the web page. Return false if failed.

title

The article title of the web page. It's may not same to the text in the <title> tag.

textBody

A string containing all the text found on the page

html

The original html of the web page.

document

The document of the web page generated by jsdom. You can use it to access the DOM directly (for example, article.document.getElementById('main')).

meta object

Response object from request lib. If you need to get current url after all redirect or get some headers it can be useful.

Why not Cheerio

This lib is using jsdom to parse HTML instead of cheerio because some data such as image size and element visibility isn't able to acquire when using cheerio, which will significantly affect the result.

Contributors

https://github.com/luin/node-readability/graphs/contributors

License

This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].