All Projects → DroidsOnRoids → Jspoon

DroidsOnRoids / Jspoon

Licence: mit
Annotation based HTML to Java parser + Retrofit converter

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Jspoon

Sparql.js
A parser for the SPARQL query language in JavaScript
Stars: ✭ 271 (-11.44%)
Mutual labels:  parser
Php Apk Parser
Read basic info about an application from .apk file.
Stars: ✭ 290 (-5.23%)
Mutual labels:  parser
Bblfshd
A self-hosted server for source code parsing
Stars: ✭ 297 (-2.94%)
Mutual labels:  parser
Jsqlparser
JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
Stars: ✭ 3,405 (+1012.75%)
Mutual labels:  parser
Link Grammar
The CMU Link Grammar natural language parser
Stars: ✭ 286 (-6.54%)
Mutual labels:  parser
Demoinfocs Golang
High performance CS:GO demo parser for Go (demoinfo)
Stars: ✭ 288 (-5.88%)
Mutual labels:  parser
Pyverilog
Python-based Hardware Design Processing Toolkit for Verilog HDL
Stars: ✭ 267 (-12.75%)
Mutual labels:  parser
Pyresparser
A simple resume parser used for extracting information from resumes
Stars: ✭ 297 (-2.94%)
Mutual labels:  parser
Exifer
A lightweight Exif meta-data decipher.
Stars: ✭ 290 (-5.23%)
Mutual labels:  parser
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Stars: ✭ 295 (-3.59%)
Mutual labels:  parser
Ojg
Optimized JSON for Go
Stars: ✭ 281 (-8.17%)
Mutual labels:  parser
Sql Parser
A validating SQL lexer and parser with a focus on MySQL dialect.
Stars: ✭ 284 (-7.19%)
Mutual labels:  parser
Raw Body
Get and validate the raw body of a readable stream
Stars: ✭ 292 (-4.58%)
Mutual labels:  parser
Parser Lib
Collection of parsers written in JavaScript
Stars: ✭ 274 (-10.46%)
Mutual labels:  parser
Psd.rb
Parse Photoshop files in Ruby with ease
Stars: ✭ 3,092 (+910.46%)
Mutual labels:  parser
Nearley
📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
Stars: ✭ 3,089 (+909.48%)
Mutual labels:  parser
Length.js
📏 JavaScript library for length units conversion.
Stars: ✭ 292 (-4.58%)
Mutual labels:  parser
Exprtk
C++ Mathematical Expression Parsing And Evaluation Library
Stars: ✭ 301 (-1.63%)
Mutual labels:  parser
App Info Parser
A javascript parser for parsing .ipa or .apk files. IPA/APK文件 js 解析器
Stars: ✭ 298 (-2.61%)
Mutual labels:  parser
Termimad
A library to display rich (Markdown) snippets and texts in a rust terminal application
Stars: ✭ 293 (-4.25%)
Mutual labels:  parser

Maven Central Javadocs

jspoon

jspoon is a Java library that provides parsing HTML into Java objects basing on CSS selectors. It uses jsoup underneath as a HTML parser.

Installation

Insert the following dependency into your project's build.gradle file:

dependencies {
    implementation 'pl.droidsonroids:jspoon:1.3.2'
}

Usage

jspoon works on any class with a default constructor. To make it work you need to annotate fields with @Selector annotation and set a CSS selector as the annotation's value:

class Page {
    @Selector("#title") String title;
    @Selector("li.a") List<Integer> intList;
    @Selector(value = "#image1", attr = "src") String imageSource;
}

Then you can create a HtmlAdapter and use it to build objects:

String htmlContent = "<div>" 
    + "<p id='title'>Title</p>" 
    + "<ul>"
    + "<li class='a'>1</li>"
    + "<li>2</li>"
    + "<li class='a'>3</li>"
    + "</ul>"
    + "<img id='image1' src='image.bmp' />"
    + "</div>";

Jspoon jspoon = Jspoon.create();
HtmlAdapter<Page> htmlAdapter = jspoon.adapter(Page.class);

Page page = htmlAdapter.fromHtml(htmlContent);
//title = "Title"; intList = [1, 3]; imageSource = "image.bmp"

It looks for the first occurrence in HTML and sets its value to a field.

Supported types

@Selector can be applied to any field of the following types (or their primitive equivalents):

  • String
  • Boolean
  • Integer
  • Long
  • Float
  • Double
  • Date
  • BigDecimal
  • Jsoup's Element
  • Any class with default constructor
  • List (or its superclass/superinterface) of supported type

It can also be used with a class, then you don't need to annotate every field inside it.

Attributes

By default, the HTML's textContent value is used on Strings, Dates and numbers. It is possible to use an attribute by setting an attr parameter in the @Selector annotation. You can also use "html" (or "innerHtml") and "outerHtml" as attr's value.

Formatting and regex

Regex can be set up by passing regex parameter to @Selector annotation. Example:

class Page {
    @Selector(value = "#numbers", regex = "([a-z]+),") String matchedNumber;
}

Date format can be set up by passing value parameter to @Format annotation. Example:

class Page {
    @Format(value = "HH:mm:ss dd.MM.yyyy")
    @Selector(value = "#date") Date date;
}
String htmlContent = "<span id='date'>13:30:12 14.07.2017</span>"
    + "<span id='numbers'>ONE, TwO, three,</span>";
Jspoon jspoon = Jspoon.create();
HtmlAdapter<Page> htmlAdapter = jspoon.adapter(Page.class);
Page page = htmlAdapter.fromHtml(htmlContent);//date = Jul 14, 2017 13:30:12; matchedNumber = "three";

Java's Locale is used for parsing Floats, Doubles and Dates. You can override it by setting languageTag @Format parameter:

@Format(languageTag = "pl")
@Selector(value = "div > p > span") Double pi; //3,14 will be parsed 

If jspoon doesn't find a HTML element it wont't set field's value unless you set the defValue parameter:

@Selector(value = "div > p > span", defValue = "NO_TEXT") String text;

Custom converterts

When format or regex is not enough, custom converter can be used to implement parsing from jsoup's Element. This can be done by extending ElementConverter class:

public class JoinChildrenClassConverter implements ElementConverter<String> {
    @Override
    public String convert(Element node, Selector selector) {
        return node.children().stream().map(Element::text).collect(Collectors.joining(", "));
    }
}

And it can be used the following way:

public class Model {
    @Selector(value = "#id", converter = JoinChildrenClassConverter::class)
    String childrenText;
}

Retrofit

Retrofit converter is available here.

Changelog

See GitHub releases

Other libraries/inspirations

  • jsoup - all HTML parsing in jspoon is made by this library
  • webGrude - when I had an idea I found this library. It was the biggest inspiration and I used some ideas from it
  • Moshi - I wanted to make jspoon work with HTML the same way as Moshi works with JSON. I adapted caching mechanism (fields and adapters) from it.
  • jsoup-annotations - similar to jspoon
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].