keepcosmos / Readability
Licence: apache-2.0
Readability is Elixir library for extracting and curating articles.
Stars: ✭ 188
Programming Languages
elixir
2628 projects
Projects that are alternatives of or similar to Readability
Termimad
A library to display rich (Markdown) snippets and texts in a rust terminal application
Stars: ✭ 293 (+55.85%)
Mutual labels: hacktoberfest, parser
Php Mime Mail Parser
A fully tested email parser for PHP 7.2+ (mailparse extension wrapper).
Stars: ✭ 687 (+265.43%)
Mutual labels: hacktoberfest, parser
Anglesharp
👼 The ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS to construct a DOM based on the official W3C specifications.
Stars: ✭ 4,018 (+2037.23%)
Mutual labels: hacktoberfest, parser
Getjs
A tool to fastly get all javascript sources/files
Stars: ✭ 190 (+1.06%)
Mutual labels: hacktoberfest, parser
Spectre.cli
An extremely opinionated command-line parser.
Stars: ✭ 121 (-35.64%)
Mutual labels: hacktoberfest, parser
Demoinfocs Golang
High performance CS:GO demo parser for Go (demoinfo)
Stars: ✭ 288 (+53.19%)
Mutual labels: hacktoberfest, parser
Valveresourceformat
🔬 Valve's Source 2 resource file format parser and decompiler
Stars: ✭ 638 (+239.36%)
Mutual labels: hacktoberfest, parser
readability-cli
A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!
Stars: ✭ 41 (-78.19%)
Mutual labels: webpage, readability
Quill Delta Parser
A PHP library to parse and render Quill WYSIWYG Deltas into HTML - Flexibel and extendible for custom elements.
Stars: ✭ 63 (-66.49%)
Mutual labels: hacktoberfest, parser
Verible
Verible is a suite of SystemVerilog developer tools, including a parser, style-linter, and formatter.
Stars: ✭ 384 (+104.26%)
Mutual labels: hacktoberfest, parser
Py Readability Metrics
📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more
Stars: ✭ 132 (-29.79%)
Mutual labels: hacktoberfest, readability
Expr Eval
Mathematical expression evaluator in JavaScript
Stars: ✭ 752 (+300%)
Mutual labels: hacktoberfest, parser
Diff2html
Pretty diff to html javascript library (diff2html)
Stars: ✭ 1,867 (+893.09%)
Mutual labels: hacktoberfest, parser
Rats
Movie Ratings Synchronization with Python
Stars: ✭ 156 (-17.02%)
Mutual labels: hacktoberfest, parser
Diagrams
🎨 Diagram as Code for prototyping cloud system architectures
Stars: ✭ 15,756 (+8280.85%)
Mutual labels: hacktoberfest
Jaeger
CNCF Jaeger, a Distributed Tracing Platform
Stars: ✭ 14,813 (+7779.26%)
Mutual labels: hacktoberfest
Virgilio
Virgilio is developed and maintained by these awesome people.
You can email us virgilio.datascience (at) gmail.com or join the Discord chat.
Stars: ✭ 13,200 (+6921.28%)
Mutual labels: hacktoberfest
V Chart Plugin
Easily bind a chart to the data stored in your Vue.js components.
Stars: ✭ 188 (+0%)
Mutual labels: hacktoberfest
Readability
Readability is a tool for extracting and curating the primary readable content of a webpage.
Check out The Documentation for full and detailed guides
Installation
If available in Hex, the package can be installed as:
- Add readability to your list of dependencies in
mix.exs
:
def deps do
[{:readability, "~> 0.9"}]
end
- Ensure readability is started before your application:
def application do
[applications: [:readability]]
end
Note: Readability requires Elixir 1.3 or higher.
Usage
Examples
Just pass a url
url = "https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58"
summary = Readability.summarize(url)
summary.title
#=> "Why I’m betting on Elixir"
summary.authors
#=> ["Ken Mazaika"]
summary.article_html
#=>
# <div><div><p id=\"3476\"><strong><em>Background: </em></strong><em>I’ve spent...
# ...
# ...button!</em></h3></div></div>
summary.article_text
#=>
# Background: I’ve spent the past 6 years building web applications in Ruby and.....
# ...
# ... value in this article, it would mean a lot to me if you hit the recommend button!
From raw html
### Extract the title.
Readability.title(html)
### Extract authors.
Readability.authors(html)
### Extract the primary content with transformed html.
html
|> Readability.article
|> Readability.readable_html
### Extract only text from the primary content.
html
|> Readability.article
|> Readability.readable_text
### you can extract the primary images with Floki
html
|> Readability.article
|> Floki.find("img")
|> Floki.attribute("src")
Options
If the result is different from your expectations, you can add options to customize it.
Example
url = "https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58"
summary = Readability.summarize(url, [clean_conditionally: false])
- min_text_length \\ 25
- remove_unlikely_candidates \\ true
- weight_classes \\ true
- clean_conditionally \\ true
- retry_length \\ 250
You can find other algorithm and regex options in readability.ex
Test
To run the test suite:
$ mix test
Todo
- [x] Extract authors
- [x] More configurable
- [x] Summarize function
- [ ] Convert relative paths into absolute paths of
img#src
anda#href
Contributions are welcome!
Check out the main features milestone and features of related projects below
Contributing
- Fork the repo on GitHub
- Clone the project to your own machine
- Commit changes to your own branch
- Push your work back up to your fork
- Submit a Pull request so that we can review your changes
NOTE: Be sure to merge the latest from "upstream" before making a pull request!
Related and Inpired Projects
- readability.js is a standalone version of the readability library used for Firefox Reader View.
- newspaper is an advanced news extraction, article extraction, and content curation library for Python.
- ruby-readability is a tool for extracting the primary readable content of a webpage.
LICENSE
This code is under the Apache License 2.0. See http://www.apache.org/licenses/LICENSE-2.0.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].