Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ndaidong → Article Parser

ndaidong / Article Parser

Licence: mit

To extract main article from given URL with Node.js

Programming Languages

javascript

184084 projects - #8 most used programming language

Labels

nodejs article readability

Projects that are alternatives of or similar to Article Parser

Clean Mark

Convert an article into a clean text

Stars: ✭ 414 (+131.28%)

Mutual labels: article, readability

Php Goose

Readability / Html Content / Article Extractor & Web Scrapping library written in PHP

Stars: ✭ 392 (+118.99%)

Mutual labels: article, readability

Py Readability Metrics

📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more

Stars: ✭ 132 (-26.26%)

Mutual labels: readability

100 Days Of Ml Code

A day to day plan for this challenge. Covers both theoritical and practical aspects

Stars: ✭ 172 (-3.91%)

Mutual labels: article

Engineering Management

A collection of inspiring resources related to engineering management and tech leadership

Stars: ✭ 2,520 (+1307.82%)

Mutual labels: article

Google Rules Of Machine Learning

Github mirror of M. Zinkevich's "Rules of Machine Learning" style guide, with extra goodness.

Stars: ✭ 137 (-23.46%)

Mutual labels: article

Code2sec.com

xmind\code\articles for my personal blog 个人博客上的资源备份存储，也是个人分享的汇总

Stars: ✭ 164 (-8.38%)

Mutual labels: article

Awesome Apollo Graphql

A curated list of amazingly awesome things regarding Apollo GraphQL ecosystem 🌟

Stars: ✭ 126 (-29.61%)

Mutual labels: article

Ttrss plugin Feediron

Evolution of ttrss_plugin-af_feedmod

Stars: ✭ 172 (-3.91%)

Mutual labels: article

Structured Data Json Ld

Collection of structured data snippets in Google preferred JSON-LD format.

Stars: ✭ 157 (-12.29%)

Mutual labels: article

Post Misread Tsne

How to Use t-SNE Effectively

Stars: ✭ 169 (-5.59%)

Mutual labels: article

Post Augmented Rnns

Attention and Augmented Recurrent Neural Networks

Stars: ✭ 154 (-13.97%)

Mutual labels: article

D2 Daily

D2 日报

Stars: ✭ 138 (-22.91%)

Mutual labels: article

Reading List Mover

A Python utility for moving bookmarks/reading lists between services

Stars: ✭ 166 (-7.26%)

Mutual labels: readability

Jbt blog

一个基于Django2.0+Python3.6的博客/A simple blog based on python3.6 and Django2.0.

Stars: ✭ 137 (-23.46%)

Mutual labels: article

Cadmium

Natural Language Processing (NLP) library for Crystal

Stars: ✭ 172 (-3.91%)

Mutual labels: readability

Php Readability

A fork of https://bitbucket.org/fivefilters/php-readability

Stars: ✭ 127 (-29.05%)

Mutual labels: readability

Post Handwriting

Four Experiments in Handwriting with a Neural Network

Stars: ✭ 144 (-19.55%)

Mutual labels: article

Readability

visualise readability

Stars: ✭ 160 (-10.61%)

Mutual labels: readability

Awesome Deep Learning Music

List of articles related to deep learning applied to music

Stars: ✭ 2,195 (+1126.26%)

Mutual labels: article

View All Similar Projects ➔

article-parser

Extract main article, main image and meta data from URL.

Demo

View screenshots for more info.

Usage

npm install article-parser

Then:

const {
  extract
} = require('article-parser');

const url = 'https://goo.gl/MV8Tkh';

extract(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

APIs

Since v4, article-parser will focus only on its main mission: extract main readable content from given webpages, such as blog posts or news entries. Although it is still able to get other kinds of content like YouTube movies, SoundCloud media, etc, they are just additions.

extract(String url | String html)

Extract data from specified url or full HTML page content. Return: a Promise

Here is how we can use article-parser:

import {
  extract
} from 'article-parser';

const getArticle = async (url) => {
  try {
    const article = await extract(url);
    return article;
  } catch (err) {
    console.trace(err);
  }
};

In comparison to v3, the article object structure has been changed too. Now it looks like below:

{
  "url": URI String,
  "title": String,
  "description": String,
  "image": URI String,
  "author": String,
  "content": HTML String,
  "published": Date String,
  "source": String, // original publisher
  "links": Array, // list of alternative links
  "ttr": Number, // time to read in second, 0 = unknown
}

Configuration methods

In addition, this lib provides some methods to customize default settings. Don't touch them unless you have reason to do that.

setParserOptions(Object parserOptions)
getParserOptions()
setNodeFetchOptions(Object nodeFetchOptions)
getNodeFetchOptions()
setSanitizeHtmlOptions(Object sanitizeHtmlOptions)
getSanitizeHtmlOptions()

Here are default properties/values:

Object `parserOptions`:

{
  wordsPerMinute: 300,
  urlsCompareAlgorithm: 'levenshtein',
}

Read string-comparison docs for more info about urlsCompareAlgorithm.

Object `nodeFetchOptions`:

{
  headers: {
    'user-agent': 'article-parser/4.0.0',
  },
  timeout: 30000,
  redirect: 'follow',
  compress: true,
  agent: false,
}

Read node-fetch docs for more info.

Object `sanitizeHtmlOptions`:

{
  allowedTags: [
    'h1', 'h2', 'h3', 'h4', 'h5',
    'u', 'b', 'i', 'em', 'strong',
    'div', 'span', 'p', 'article', 'blockquote', 'section',
    'pre', 'code',
    'ul', 'ol', 'li', 'dd', 'dl',
    'table', 'th', 'tr', 'td', 'thead', 'tbody', 'tfood',
    'label',
    'fieldset', 'legend',
    'img', 'picture',
    'br', 'p', 'hr',
    'a',
  ],
  allowedAttributes: {
    a: ['href'],
    img: ['src', 'alt'],
  },
}

Read sanitize-html docs for more info.

Screenshots

Article Parser demo:

Example FasS with Google Cloud Function

Test

git clone https://github.com/ndaidong/article-parser.git
cd article-parser
npm install  // or `yarn install` or `pnpm install`
npm test

License

The MIT License (MIT)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 179

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ndaidong / Article Parser

Programming Languages

Labels

Projects that are alternatives of or similar to Article Parser

article-parser

Demo

Usage

APIs

extract(String url | String html)

Configuration methods

Object parserOptions:

Object nodeFetchOptions:

Object sanitizeHtmlOptions:

Screenshots

Test

License

Object `parserOptions`:

Object `nodeFetchOptions`:

Object `sanitizeHtmlOptions`: