All Projects → j0k3r → Php Readability

j0k3r / Php Readability

Licence: apache-2.0
A fork of https://bitbucket.org/fivefilters/php-readability

Projects that are alternatives of or similar to Php Readability

Graby
Graby helps you extract article content from web pages
Stars: ✭ 281 (+121.26%)
Mutual labels:  content, readability
Cash
HTTP response caching for Koa. Supports Redis, in-memory store, and more!
Stars: ✭ 122 (-3.94%)
Mutual labels:  content
Manifoldcf
Mirror of Apache ManifoldCF
Stars: ✭ 65 (-48.82%)
Mutual labels:  content
Readability2
Readability2 converts HTML to plain text.
Stars: ✭ 100 (-21.26%)
Mutual labels:  readability
Reader
Extract clean(er), readable text from web pages via Mercury Web Parser.
Stars: ✭ 75 (-40.94%)
Mutual labels:  readability
Orchestra
One language to be RegExp's Successor. Visually readable and rich, technically safe and extended, naturally scalable, advanced, and optimized
Stars: ✭ 103 (-18.9%)
Mutual labels:  readability
React No Content
A SVG react component to display when there's no content.
Stars: ✭ 59 (-53.54%)
Mutual labels:  content
Directus
Open-Source Data Platform 🐰 — Directus wraps any SQL database with a real-time GraphQL+REST API and an intuitive app for non-technical users.
Stars: ✭ 13,190 (+10285.83%)
Mutual labels:  content
Moodle Downloader 2
A Moodle downloader that downloads course content fast from Moodle (eg. lecture pdfs)
Stars: ✭ 118 (-7.09%)
Mutual labels:  content
Sspipe
Simple Smart Pipe: python productivity-tool for rapid data manipulation
Stars: ✭ 96 (-24.41%)
Mutual labels:  readability
Allura
Mirror of Apache Allura
Stars: ✭ 91 (-28.35%)
Mutual labels:  content
Levelgenerator
Unity plug-in for procedurally generating indoor levels using abstract chunks.
Stars: ✭ 82 (-35.43%)
Mutual labels:  content
Next
Directus is a real-time API and App dashboard for managing SQL database content. 🐰
Stars: ✭ 111 (-12.6%)
Mutual labels:  content
Mform
Spielend einfach umfangreiche Modul-Input-Formulare erzeugen.
Stars: ✭ 65 (-48.82%)
Mutual labels:  content
Cms
MaxSite CMS
Stars: ✭ 123 (-3.15%)
Mutual labels:  content
Poi
Mirror of Apache POI
Stars: ✭ 1,136 (+794.49%)
Mutual labels:  content
Patreondownloader
Powerful tool for downloading content posted by creators on patreon.com. Supports content hosted on patreon itself as well as external sites (additional plugins might be required).
Stars: ✭ 89 (-29.92%)
Mutual labels:  content
Pdfbox
Mirror of Apache PDFBox
Stars: ✭ 1,384 (+989.76%)
Mutual labels:  content
Moviecontentfilter
Watch movies with the freedom (not) to filter
Stars: ✭ 126 (-0.79%)
Mutual labels:  content
Mercury fulltext
📖 Enjoy full text for tt-rss.
Stars: ✭ 123 (-3.15%)
Mutual labels:  readability

Readability

CI Build Status Coverage Status Total Downloads License

This is an extract of the Readability class from this full-text-rss fork. It can be defined as a better version of the original php-readability.

Differences

The default php-readability lib is really old and needs to be improved. I found a great fork of full-text-rss from @Dither which improve the Readability class.

  • I've extracted the class from its fork to be able to use it out of the box
  • I've added some simple tests
  • and changed the CS, run php-cs-fixer and added a namespace

But the code is still really hard to understand / read ...

Requirements

By default, this lib will use the Tidy extension if it's available. Tidy is only used to cleanup the given HTML and avoid problems with bad HTML structure, etc .. It'll be suggested by Composer.

Also, if you got problem from parsing a content without Tidy installed, please install it and try again.

Usage

use Readability\Readability;

$url = 'http://www.medialens.org/index.php/alerts/alert-archive/alerts-2013/729-thatcher.html';

// you can use whatever you want to retrieve the html content (Guzzle, Buzz, cURL ...)
$html = file_get_contents($url);

$readability = new Readability($html, $url);
// or without Tidy
// $readability = new Readability($html, $url, 'libxml', false);
$result = $readability->init();

if ($result) {
    // display the title of the page
    echo $readability->getTitle()->textContent;
    // display the *readability* content
    echo $readability->getContent()->textContent;
} else {
    echo 'Looks like we couldn\'t find the content. :(';
}

If you want to debug it, or check what's going on, you can inject a logger (which must follow Psr\Log\LoggerInterface, Monolog for example):

use Readability\Readability;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$url = 'http://www.medialens.org/index.php/alerts/alert-archive/alerts-2013/729-thatcher.html';
$html = file_get_contents($url);

$logger = new Logger('readability');
$logger->pushHandler(new StreamHandler('path/to/your.log', Logger::DEBUG));

$readability = new Readability($html, $url);
$readability->setLogger($logger);
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].