All Projects → gajus → crack-json

gajus / crack-json

Licence: other
Extracts all JSON objects from an arbitrary text document.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to crack-json

Me Tools
Tools for working with Intel ME
Stars: ✭ 165 (+489.29%)
Mutual labels:  extract
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+775%)
Mutual labels:  extract
keyword-extract
简单高效的URL关键词提取工具
Stars: ✭ 15 (-46.43%)
Mutual labels:  extract
Earth Reverse Engineering
Reversing Google's 3D satellite mode
Stars: ✭ 2,083 (+7339.29%)
Mutual labels:  extract
Getjs
A tool to fastly get all javascript sources/files
Stars: ✭ 190 (+578.57%)
Mutual labels:  extract
Swiftsoup
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
Stars: ✭ 3,079 (+10896.43%)
Mutual labels:  extract
Sypht Python Client
A python client for the Sypht API
Stars: ✭ 160 (+471.43%)
Mutual labels:  extract
tar
A simple tar implementation in C
Stars: ✭ 89 (+217.86%)
Mutual labels:  extract
Link Preview Js
Parse and/or extract web links meta information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.
Stars: ✭ 240 (+757.14%)
Mutual labels:  extract
qresExtract
Qt binary resource (qres) extractor
Stars: ✭ 26 (-7.14%)
Mutual labels:  extract
Extract React Intl Messages
extract react intl messages
Stars: ✭ 174 (+521.43%)
Mutual labels:  extract
Sass Extract
Extract structured variables from sass files
Stars: ✭ 183 (+553.57%)
Mutual labels:  extract
Blacksmith
Blacksmith is a tool for viewing, extracting, and converting textures, 3D models, and sounds from Assassin's Creed: Odyssey/Origins/Valhalla and Steep.
Stars: ✭ 104 (+271.43%)
Mutual labels:  extract
Pythonvscode
This extension is now maintained in the Microsoft fork.
Stars: ✭ 2,013 (+7089.29%)
Mutual labels:  extract
DyAnnotationExtractor
DyAnnotationExtractor is software for extracting annotations (highlighted text and comments) from e-documents like PDF.
Stars: ✭ 34 (+21.43%)
Mutual labels:  extract
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+489.29%)
Mutual labels:  extract
Datashare
Better analyze information, in all its forms
Stars: ✭ 254 (+807.14%)
Mutual labels:  extract
WindowTextExtractor
WindowTextExtractor allows you to get a text from any window of an operating system including asterisk passwords
Stars: ✭ 128 (+357.14%)
Mutual labels:  extract
extract-xiso
Xbox ISO Creation/Extraction utility. Imported from SourceForge.
Stars: ✭ 358 (+1178.57%)
Mutual labels:  extract
yellowpages-scraper
Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.
Stars: ✭ 56 (+100%)
Mutual labels:  extract

crack-json 🥊

Travis build status Coveralls NPM version Canonical Code Style Twitter Follow

Extracts all JSON objects from an arbitrary text document.

Use case

The primary use-case is extracting structured data from non-structured documents, e.g. when scraping websites, it is common that HTML embeds JSON or JSON-like data structures.

<script>
$(document).on('BookingApp:SeatingPlan:Ready', () => {
  $(document).trigger('BookingApp:StartSeatingPlanOnly', {
    "sessionId": "438a8373-5fab-4d36-ac92-053ae2d04e9c"
  });
});
</script>

The way that the crack-json is intended to be used is that the scraper must narrow down the document to the HTML containing the subject JSON data and then crack-json is used to extract all JSON-like objects. If in the above example we are interested in extracting the sessionId, then it would be sufficient to get innerHTML of the script tag, use crack-json to extract all JSON-like objects, and search for the matching object, e.g.

const session = extractJson(document.querySelector('script').innerHTML)
  .find((maybeTargetSubject) => {
    return maybeTargetSubject.sessionId;
  });

session;
// {
//   "sessionId": "438a8373-5fab-4d36-ac92-053ae2d04e9c"
// }

Implementation

crack-json iterates through the input text by searching for characters that indicate the start of a JSON object, array or text entity, and attempts to match the closing character and parse the resulting string. crack-json iterates through document this way until it finds all text entities that can be parsed as JSON.

API

crack-json extracts a single function: extractJson.

import {
  extractJson
} from 'crack-json';

extractJson API

/**
 * @property filter Used to filter out strings before attempting to decode them.
 * @property parser A parser used to extract JSON from the suspected strings. Default: `JSON.parse`.
 */
type ExtractJsonConfigurationType = {|
  +filter?: (input: string) => boolean,
  +parser?: (input: string) => any,
|};

type ExtractJsonType = (subject: string, configuration?: ExtractJsonConfigurationType) => any;

extractJson: ExtractJsonType;

Usage

import {
  extractJson
} from 'crack-json';

const payload = `
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus ultricies laoreet malesuada. In feugiat augue non tristique pharetra. Duis nisl odio, vulputate maximus suscipit sit amet, ultrices vel lacus.

{"foo": "bar"}

Suspendisse volutpat risus id nibh lacinia, in placerat urna luctus. Phasellus condimentum nec ipsum ut tincidunt. Nullam aliquam euismod ante, vitae accumsan leo egestas a. Aliquam sed lacus nisl. Pellentesque nec hendrerit sem.

[{"baz": "qux"}]

Phasellus iaculis dui nec purus imperdiet placerat non sit amet odio. Donec pretium, arcu ac suscipit imperdiet, tellus orci convallis leo, non laoreet tortor lectus at dolor. Aenean tellus diam, imperdiet nec eleifend at, fermentum sit amet tellus. Vestibulum id purus ac mauris eleifend iaculis.

"quux"

Vestibulum sit amet quam tellus. Nulla facilisi.

`;

console.log(extractJson(payload));

Output:

[
  {
    foo: 'bar'
  },
  [
    {
      baz: 'qux'
    }
  ],
  'quux'
]

Filtering out matches

You can use filter to exclude strings before they are parsed using an arbitrary condition. This will improve performance and reduce output only to the desirable objects, e.g.

import {
  extractJson
} from 'crack-json';

const payload = `
  <script>
  const foo = {
    cinemaId: '1',
  };
  const bar = {
    venueId: '1',
  };
  const baz = {
    userId: '1',
  };
  </script>
`;

console.log(extractJson(payload, {
  filter: (input) => {
    return input.includes('userId')
  },
}));

Output:

[
  {
    userId: '1',
  },
]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].