All Projects → teamseodo → muninn

teamseodo / muninn

Licence: MIT License
With a simple, flexible and maintainable configuration file, you can parse html and output json according to the schema you specify.

Programming Languages

typescript
32286 projects

Projects that are alternatives of or similar to muninn

Cheerio
Fast, flexible, and lean implementation of core jQuery designed specifically for the server.
Stars: ✭ 24,616 (+64678.95%)
Mutual labels:  cheerio, htmlparser
gpp-decrypt
Tool to parse the Group Policy Preferences XML file which extracts the username and decrypts the cpassword attribute.
Stars: ✭ 13 (-65.79%)
Mutual labels:  parse
erudite
A JavaScript equivalent to Literate CoffeeScript
Stars: ✭ 18 (-52.63%)
Mutual labels:  parse
arachnod
High performance crawler for Nodejs
Stars: ✭ 17 (-55.26%)
Mutual labels:  cheerio
html-dom-parser
📝 HTML to DOM parser.
Stars: ✭ 56 (+47.37%)
Mutual labels:  parse
python-yamlable
A thin wrapper of PyYaml to convert Python objects to YAML and back
Stars: ✭ 28 (-26.32%)
Mutual labels:  parse
cmd-ts
💻 A type-driven command line argument parser
Stars: ✭ 92 (+142.11%)
Mutual labels:  parse
LeagueReplayParser
C# library which can read some data from a .rofl file, and start a replay in the client. (no longer actively maintained)
Stars: ✭ 20 (-47.37%)
Mutual labels:  parse
Script.apex
Evaluate Javascript expressions in Apex
Stars: ✭ 18 (-52.63%)
Mutual labels:  parse
sjson-cpp
An Simplified JSON (SJSON) C++ reader and writer
Stars: ✭ 16 (-57.89%)
Mutual labels:  parse
OpenGraph-Net
.Net Open Graph Parser written in C#
Stars: ✭ 111 (+192.11%)
Mutual labels:  parse
parse-torrent-file
DEPRECATED: Parse a .torrent file and return an object of keys/values
Stars: ✭ 62 (+63.16%)
Mutual labels:  parse
ytnef
Yeraze's TNEF Stream Reader - for winmail.dat files
Stars: ✭ 28 (-26.32%)
Mutual labels:  parse
exoffice
Library to parse common excel formats (xls, xlsx, csv)
Stars: ✭ 31 (-18.42%)
Mutual labels:  parse
gonids
gonids is a library to parse IDS rules, with a focus primarily on Suricata rule compatibility. There is a discussion forum available that you can join on Google Groups: https://groups.google.com/forum/#!topic/gonids/
Stars: ✭ 140 (+268.42%)
Mutual labels:  parse
crawler CIA CREST
R-crawler for CIA website (CREST)
Stars: ✭ 15 (-60.53%)
Mutual labels:  parse
desktop
Extendable calculator for the 21st Century ⚡
Stars: ✭ 85 (+123.68%)
Mutual labels:  parse
json-source-map
Parse/stringify JSON and provide source-map for JSON-pointers to all nodes - supports BigInt, Maps, Sets and Typed arrays
Stars: ✭ 55 (+44.74%)
Mutual labels:  parse
VueStudy
Vue.js学习系列示例代码及教程
Stars: ✭ 80 (+110.53%)
Mutual labels:  cheerio
pinus-parse-interface
parse interface to pinus-protobuf JSON
Stars: ✭ 25 (-34.21%)
Mutual labels:  parse

muninn

npm Build Status License

Muninn is an HTML parsing tool. It is fast. It allows you to create a configuration file. This makes it easy to keep parser settings up to date despite changing selectors. It takes very little time to learn thanks to the easy syntax. It uses the cheerio library for parsing. It is simple and flexible for various needs.

It also has a useful extension that visualizes your configuration files on the pages you will parse. See Muninn Extension

Documentation

Sample

import { parse } from 'muninn';

const config = {
  schema: {
    title: '#productTitle',
    price: '#priceblock_ourprice',
    rating: {
      selector: '#acrPopover span | float',
      regex: /\d+\.?\d?/
    },
    features: {
      selector: '#productOverview_feature_div tr.a-spacing-small | array',
      schema: {
        name: 'td:nth-child(1)',
        value: 'td:nth-child(2)'
      }
    }
  }
};

// The `data` is an HTML Content of type string.
// https://www.amazon.com/AMD-Ryzen-3700X-16-Thread-Processor/dp/B07SXMZLPK/
const data = '<html>...</html>';

const result = parse(data, config);

Output

{
  "title": "AMD Ryzen 7 3700X 8-Core, 16-Thread Unlocked Desktop Processor with Wraith Prism LED Cooler",
  "price": "$308.99",
  "rating": 4.9,
  "features": [
    {
      "name": "Brand",
      "value": "AMD"
    },
    {
      "name": "CPU Model",
      "value": "AMD Ryzen 7"
    },
    {
      "name": "CPU Speed",
      "value": "4.4 GHz"
    },
    {
      "name": "CPU Socket",
      "value": "Socket AM4"
    },
    {
      "name": "Processor Count",
      "value": "8"
    }
  ]
}

License

Distributed under the MIT License. See LICENSE for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].