All Projects → keepcosmos → Readability

keepcosmos / Readability

Licence: apache-2.0
Readability is Elixir library for extracting and curating articles.

Programming Languages

elixir
2628 projects

Projects that are alternatives of or similar to Readability

Termimad
A library to display rich (Markdown) snippets and texts in a rust terminal application
Stars: ✭ 293 (+55.85%)
Mutual labels:  hacktoberfest, parser
Php Mime Mail Parser
A fully tested email parser for PHP 7.2+ (mailparse extension wrapper).
Stars: ✭ 687 (+265.43%)
Mutual labels:  hacktoberfest, parser
Anglesharp
👼 The ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS to construct a DOM based on the official W3C specifications.
Stars: ✭ 4,018 (+2037.23%)
Mutual labels:  hacktoberfest, parser
Getjs
A tool to fastly get all javascript sources/files
Stars: ✭ 190 (+1.06%)
Mutual labels:  hacktoberfest, parser
Spectre.cli
An extremely opinionated command-line parser.
Stars: ✭ 121 (-35.64%)
Mutual labels:  hacktoberfest, parser
Demoinfocs Golang
High performance CS:GO demo parser for Go (demoinfo)
Stars: ✭ 288 (+53.19%)
Mutual labels:  hacktoberfest, parser
Valveresourceformat
🔬 Valve's Source 2 resource file format parser and decompiler
Stars: ✭ 638 (+239.36%)
Mutual labels:  hacktoberfest, parser
readability-cli
A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!
Stars: ✭ 41 (-78.19%)
Mutual labels:  webpage, readability
Solid
Liquid template engine in Elixir
Stars: ✭ 68 (-63.83%)
Mutual labels:  hacktoberfest, parser
Quill Delta Parser
A PHP library to parse and render Quill WYSIWYG Deltas into HTML - Flexibel and extendible for custom elements.
Stars: ✭ 63 (-66.49%)
Mutual labels:  hacktoberfest, parser
Verible
Verible is a suite of SystemVerilog developer tools, including a parser, style-linter, and formatter.
Stars: ✭ 384 (+104.26%)
Mutual labels:  hacktoberfest, parser
Py Readability Metrics
📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more
Stars: ✭ 132 (-29.79%)
Mutual labels:  hacktoberfest, readability
Expr Eval
Mathematical expression evaluator in JavaScript
Stars: ✭ 752 (+300%)
Mutual labels:  hacktoberfest, parser
Diff2html
Pretty diff to html javascript library (diff2html)
Stars: ✭ 1,867 (+893.09%)
Mutual labels:  hacktoberfest, parser
Rats
Movie Ratings Synchronization with Python
Stars: ✭ 156 (-17.02%)
Mutual labels:  hacktoberfest, parser
Diagrams
🎨 Diagram as Code for prototyping cloud system architectures
Stars: ✭ 15,756 (+8280.85%)
Mutual labels:  hacktoberfest
Orca
Orchestration engine
Stars: ✭ 187 (-0.53%)
Mutual labels:  hacktoberfest
Jaeger
CNCF Jaeger, a Distributed Tracing Platform
Stars: ✭ 14,813 (+7779.26%)
Mutual labels:  hacktoberfest
Virgilio
Virgilio is developed and maintained by these awesome people. You can email us virgilio.datascience (at) gmail.com or join the Discord chat.
Stars: ✭ 13,200 (+6921.28%)
Mutual labels:  hacktoberfest
V Chart Plugin
Easily bind a chart to the data stored in your Vue.js components.
Stars: ✭ 188 (+0%)
Mutual labels:  hacktoberfest

Readability

Build Status Readability version

Readability is a tool for extracting and curating the primary readable content of a webpage.
Check out The Documentation for full and detailed guides

Installation

If available in Hex, the package can be installed as:

  1. Add readability to your list of dependencies in mix.exs:
def deps do
  [{:readability, "~> 0.9"}]
end
  1. Ensure readability is started before your application:
def application do
  [applications: [:readability]]
end

Note: Readability requires Elixir 1.3 or higher.

Usage

Examples

Just pass a url

url = "https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58"
summary = Readability.summarize(url)

summary.title
#=> "Why I’m betting on Elixir"

summary.authors
#=> ["Ken Mazaika"]

summary.article_html
#=>
# <div><div><p id=\"3476\"><strong><em>Background: </em></strong><em>I’ve spent...
# ...
# ...button!</em></h3></div></div>

summary.article_text
#=>
# Background: I’ve spent the past 6 years building web applications in Ruby and.....
# ...
# ... value in this article, it would mean a lot to me if you hit the recommend button!

From raw html

### Extract the title.
Readability.title(html)

### Extract authors.
Readability.authors(html)

### Extract the primary content with transformed html.
html
|> Readability.article
|> Readability.readable_html

### Extract only text from the primary content.
html
|> Readability.article
|> Readability.readable_text

### you can extract the primary images with Floki
html
|> Readability.article
|> Floki.find("img")
|> Floki.attribute("src")

Options

If the result is different from your expectations, you can add options to customize it.

Example

url = "https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58"
summary = Readability.summarize(url, [clean_conditionally: false])
  • min_text_length \\ 25
  • remove_unlikely_candidates \\ true
  • weight_classes \\ true
  • clean_conditionally \\ true
  • retry_length \\ 250

You can find other algorithm and regex options in readability.ex

Test

To run the test suite:

$ mix test

Todo

  • [x] Extract authors
  • [x] More configurable
  • [x] Summarize function
  • [ ] Convert relative paths into absolute paths of img#src and a#href

Contributions are welcome!

Check out the main features milestone and features of related projects below

Contributing

  1. Fork the repo on GitHub
  2. Clone the project to your own machine
  3. Commit changes to your own branch
  4. Push your work back up to your fork
  5. Submit a Pull request so that we can review your changes

NOTE: Be sure to merge the latest from "upstream" before making a pull request!

Related and Inpired Projects

  • readability.js is a standalone version of the readability library used for Firefox Reader View.
  • newspaper is an advanced news extraction, article extraction, and content curation library for Python.
  • ruby-readability is a tool for extracting the primary readable content of a webpage.

LICENSE

This code is under the Apache License 2.0. See http://www.apache.org/licenses/LICENSE-2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].