All Projects → itemsapi → website-to-json

itemsapi / website-to-json

Licence: MIT license
Converts website to json using jQuery selectors

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to website-to-json

Ferret
Declarative web scraping
Stars: ✭ 4,837 (+12972.97%)
Mutual labels:  data-mining, scraper
blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
Stars: ✭ 100 (+170.27%)
Mutual labels:  data-mining, scraper
scraper
A web scraper starter project
Stars: ✭ 18 (-51.35%)
Mutual labels:  scraper, cheerio
Cheerio
Fast, flexible, and lean implementation of core jQuery designed specifically for the server.
Stars: ✭ 24,616 (+66429.73%)
Mutual labels:  scraper, cheerio
Instagram-Comments-Scraper
Instagram comment scraper using python and selenium. Save the comments into excel.
Stars: ✭ 73 (+97.3%)
Mutual labels:  data-mining, scraper
LeetCode
At present contains scraped data from around 1500 problems present on the site. More to follow....
Stars: ✭ 45 (+21.62%)
Mutual labels:  data-mining, scraper
arachnod
High performance crawler for Nodejs
Stars: ✭ 17 (-54.05%)
Mutual labels:  scraper, cheerio
evine
Interactive CLI Web Crawler
Stars: ✭ 140 (+278.38%)
Mutual labels:  data-mining, scraper
Twitter Get Old Tweets Scraper
A data scraper for retrieving old tweets in Twitter using Python3.
Stars: ✭ 27 (-27.03%)
Mutual labels:  data-mining, scraper
pyitau
Unofficial client to access your Itaú bank data
Stars: ✭ 28 (-24.32%)
Mutual labels:  scraper
xforest
A super-fast and scalable Random Forest library based on fast histogram decision tree algorithm and distributed bagging framework. It can be used for binary classification, multi-label classification, and regression tasks. This library provides both Python and command line interface to users.
Stars: ✭ 20 (-45.95%)
Mutual labels:  data-mining
rose
Analyse all kinds of data for a TV series
Stars: ✭ 34 (-8.11%)
Mutual labels:  scraper
scrapeer
Essential PHP library that scrapes HTTP(S) and UDP trackers for torrent information.
Stars: ✭ 81 (+118.92%)
Mutual labels:  scraper
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+145.95%)
Mutual labels:  data-mining
sciblox
sciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (+29.73%)
Mutual labels:  data-mining
GChan
Scrape boards and threads from 4chan (8kun WIP). Downloads images, videos and HTML if desired.
Stars: ✭ 31 (-16.22%)
Mutual labels:  scraper
twpy
Twitter High level scraper for humans.
Stars: ✭ 58 (+56.76%)
Mutual labels:  scraper
xgboost-smote-detect-fraud
Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!
Stars: ✭ 59 (+59.46%)
Mutual labels:  data-mining
gHarvester
Proof of concept for a security issue (in my opinion) that I found in accounts.google.com
Stars: ✭ 20 (-45.95%)
Mutual labels:  scraper
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+43.24%)
Mutual labels:  scraper

Website to json converter (wtj)

This tool converts each website to understandable JSON by jQuery selectors.

Installation

$ npm install website-to-json --save

Getting started

Examples

Stack Overflow

var wtj = require('website-to-json')
wtj.extractData('http://stackoverflow.com/questions/3207418/crawler-vs-scraper', {
  fields: ['data'],
  parse: function($) {
    return {
      title: $("h1").text(),
      keywords: $('.post-taglist a').map(function(val) {
        return $(this).text()
      }).get()
    }
  }
})
.then(function(res) {
  console.log(JSON.stringify(res, null, 2));
})

Response

{
  "data": {
    "title": "crawler vs scraper",
    "keywords": [
      "web-crawler",
      "terminology",
      "scraper"
    ]
  }
}

IMDB

var trim = require('trim')
var wtj = require('website-to-json')

wtj.extractData('http://www.imdb.com/title/tt0111161', {
  fields: ['data'],
  parse: function($) {
    return {
      title: trim($(".title_wrapper h1").text()),
      image: $(".poster img").attr('src'),
      summary: trim($(".plot_summary .summary_text").text())
    }
  }
})
.then(function(res) {
  console.log(JSON.stringify(res, null, 2));
})

Response

{
  "data": {
    "title": "The Shawshank Redemption (1994)",
    "image": "https://images-na.ssl-images-amazon.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_UX182_CR0,0,182,268_AL_.jpg",
    "summary": "Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency."
  }
}

IMDB many URL's

var wtj = require('website-to-json');
var trim = require('trim');

Promise.all([
  'http://www.imdb.com/title/tt0111161',
  'http://www.imdb.com/title/tt0137523',
  'http://www.imdb.com/title/tt0068646'
])
.map(function(url) {
  return wtj.extractUrl(url, {
    fields: ['data'],
    parse: function($) {
      return {
        title: trim($(".title_wrapper h1").text()),
        image: $(".poster img").attr('src')
      }
    }
  })
}, {concurrency: 1})
.then(function(res) {
  console.log(JSON.stringify(res, null, 2));
})

Nightmare.js

CLI

$ sudo npm install website-to-json -g
$ wtj twitter.com/itemsapi
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].