All Projects → zlurad → serp-parser

zlurad / serp-parser

Licence: MIT license
Nodejs lib to parse Google SERP html pages

Programming Languages

HTML
75241 projects
typescript
32286 projects

Projects that are alternatives of or similar to serp-parser

qBittorrent search engines
Search engines for qBittorrent
Stars: ✭ 35 (+25%)
Mutual labels:  search-engines
weltschmerz
Weltschmerz by age - "I am X years old and... [Google autocomplete]"
Stars: ✭ 23 (-17.86%)
Mutual labels:  google-search
ublacklist
Blocks specific sites from appearing in Google search results
Stars: ✭ 3,726 (+13207.14%)
Mutual labels:  google-search
google-search-results-nodejs
SerpApi client library for Node.js. Previously: Google Search Results Node.js.
Stars: ✭ 46 (+64.29%)
Mutual labels:  serp
SmartImage
Reverse image search tool (SauceNao, ImgOps, trace.moe, and more)
Stars: ✭ 346 (+1135.71%)
Mutual labels:  search-engines
svelte-seo
Optimize your website for search engines and social media with meta tags, Open Graph, and JSON-LD.
Stars: ✭ 257 (+817.86%)
Mutual labels:  search-engines
domains
World’s single largest Internet domains dataset
Stars: ✭ 461 (+1546.43%)
Mutual labels:  search-engines
seo-audits-toolkit
SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...
Stars: ✭ 311 (+1010.71%)
Mutual labels:  serp
Setheum
Setheum Network - Islamic Finance DeFi, Multi-Stablecoins, Payments, EVM, DeFi on Rockets to Heaven => 🚀ready for hacking🚀
Stars: ✭ 15 (-46.43%)
Mutual labels:  serp
Googler
🔍 Google from the terminal
Stars: ✭ 5,594 (+19878.57%)
Mutual labels:  google-search
php-google
Google search results crawler, get google search results that you need - php
Stars: ✭ 23 (-17.86%)
Mutual labels:  google-search
google-this
🔎 A simple yet powerful module to retrieve organic search results and much more from Google.
Stars: ✭ 88 (+214.29%)
Mutual labels:  google-search
Googlescraper
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Stars: ✭ 2,363 (+8339.29%)
Mutual labels:  search-engines
The Book Of Secret Knowledge
A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.
Stars: ✭ 55,582 (+198407.14%)
Mutual labels:  search-engines
dorkScanner
A typical search engine dork scanner scrapes search engines with dorks that you provide in order to find vulnerable URLs.
Stars: ✭ 93 (+232.14%)
Mutual labels:  search-engines
ftsb
Full Text Search Benchmark, a tool for comparing and evaluating full-text search engines.
Stars: ✭ 12 (-57.14%)
Mutual labels:  search-engines

SERP Parser

Build Status codecov

serp-parser is small lib writen in typescript used to extract search engine rank position from the html.

Instalation

npm i serp-parser
yarn add serp-parser

Usage - Google SERP extraction

GoogleSERP accepts both html that is extracted with any headless browser lib (puppeteer, phantomjs...) that have enabled javascript as well as html page structure from no-js-enabled requests from for example request lib. For full enabled js html we use GoogleSERP class, and for nojs pages GoogleNojsSERP class.

With html from headless browser we use full GoogleSERP parser

import { GoogleSERP } from 'serp-parser'

const parser = new GoogleSERP(html);

console.dir(parser.serp);

Or on es5 with request lib, we get nojs Google results, so we use GoogleNojsSERP parser that is separate class in the lib

var request = require("request")
var sp = require("serp-parser")

request('https://www.google.com/search?q=google', function (error, response, html) {
  if (!error && response.statusCode == 200) {
    parser = new sp.GoogleNojsSERP(html);
    console.dir(parser.serp);
  }
});

It will return serp object with array of results with domain, position, title, url, cached url, similar url, link type, sitelinks and snippet

{
  "keyword: "google",
  "totalResults": 15860000000,
  "timeTaken": 0.61,
  "currentPage": 1,
  "pagination": [
    { page: 1,
      path: "" },
    { page: 2,
      path: "/search?q=google&safe=off&gl=US&pws=0&nfpr=1&ei=N1QvXKbhOLCC5wLlvLa4Dg&start=10&sa=N&ved=0ahUKEwjm2Mn2ktTfAhUwwVkKHWWeDecQ8tMDCOwB" },
    ...
  ],
  "videos": [
    { title: "The Matrix YouTube Movies Science Fiction - 1999 $ From $3.99",
      sitelink: "https://www.youtube.com/watch?v=3DfOTKGvtOM",
      date: 2018-10-28T23:00:00.000Z,
      source: "YouTube",
      channel: "Warner Movies On Demand",
      videoDuration: "2:23" },
    ...
  ],
  "thumbnailGroups": [
      { "heading": "Organization software",
        "thumbnails:": [ {
          "sitelink": "/search?safe=off&gl=US&pws=0&nfpr=1&q=Microsoft&stick=H4sIAAAAAAAAAONgFuLUz9U3MDFNNk9S4gAzi8tMtGSyk630k0qLM_NSi4v1M4uLS1OLrIozU1LLEyuLVzGKp1n5F6Un5mVWJZZk5ucpFOenlZQnFqUCAMQud6xPAAAA&sa=X&ved=2ahUKEwjm2Mn2ktTfAhUwwVkKHWWeDecQxA0wHXoECAQQBQ",
          "title": "Microsoft Corporation"
        },
        ...
      ]
    },
    ...
  ],
  "organic": [
    {
      "domain": "www.google.com",
      "position": 1,
      "title": "Google",
      "url": "https://www.google.com/",
      "cachedUrl": "https://webcache.googleusercontent.com/search?q=cache:y14FcUQOGl4J:https://www.google.com/+&cd=1&hl=en&ct=clnk&gl=us",
      "similarUrl": "/search?safe=off&gl=US&pws=0&nfpr=1&q=related:https://www.google.com/+google&tbo=1&sa=X&ved=2ahUKEwjm2Mn2ktTfAhUwwVkKHWWeDecQHzAAegQIARAG",
      "linkType": "HOME",
      "sitelinks": [
        { "title": "Google Docs",
          "snippet": "Google Docs brings your documents to life with smart ...",
          "type": "card" },
        { "title": "Google News",
          "snippet": "Comprehensive up-to-date news coverage, aggregated from ...",
          "type": "card" },
        ...
      ],
      "snippet": "Settings Your data in Search Help Send feedback. AllImages. Account · Assistant · Search · Maps · YouTube · Play · News · Gmail · Contacts · Drive · Calendar."
    },
    {
      "domain": "www.google.org",
      "position": 2,
      "title": "Google.org: Home",
      "url": "https://www.google.org/",
      "cachedUrl": "https://webcache.googleusercontent.com/search?q=cache:Nm9ycLj-SKoJ:https://www.google.org/+&cd=24&hl=en&ct=clnk&gl=us",
      "similarUrl": "/search?safe=off&gl=US&pws=0&nfpr=1&q=related:https://www.google.org/+google&tbo=1&sa=X&ved=2ahUKEwjm2Mn2ktTfAhUwwVkKHWWeDecQHzAXegQIDBAF",
      "linkType": "HOME",
      "snippet": "Data-driven, human-focused philanthropy powered by Google. We bring the best of Google to innovative nonprofits that are committed to creating a world that..."
    },
    ...
  ],
  "relatedKeywords": [
    { keyword: google search,
      path: "/search?safe=off&gl=US&pws=0&nfpr=1&q=google+search&sa=X&ved=2ahUKEwjm2Mn2ktTfAhUwwVkKHWWeDecQ1QIoAHoECA0QAQ" },
    { keyword: google account,
      path: "/search?safe=off&gl=US&pws=0&nfpr=1&q=google+account&sa=X&ved=2ahUKEwjm2Mn2ktTfAhUwwVkKHWWeDecQ1QIoAXoECA0QAg" },
    ...
  ]
}

Usage - Bing SERP extraction

Note: Only BingNojsSerp is implemented so far.

BingSERP works the same as GoogleSerp. It accepts both html that is extracted with any headless browser lib (puppeteer, phantomjs...) that have enabled javascript as well as html page structure from no-js-enabled requests from for example request lib. For full enabled js html we use BingSERP class, and for nojs pages BingNojsSERP class.

With html from headless browser we use full BingSERP parser

import { BingNojsSERP } from 'serp-parser'

const parser = new BingNojsSERP(html);

console.dir(parser.serp);

Or on es5 with request lib, we get nojs Bing results, so we use BingNojsSERP parser that is separate class in the lib

var request = require("request")
var sp = require("serp-parser")

request('https://www.bing.com/search?q=bing', function (error, response, html) {
  if (!error && response.statusCode == 200) {
    parser = new sp.BingNojsSERP(html);
    console.dir(parser.serp);
  }
});

Parse options - specify what features to parse

On each module you can specify the parse options as a second parameter to only parse SERP features that you want, for GoogleSERP here are all the options:

  // Options for GoogleSERP
  #DEF_OPTIONS = {
    organic: true,
    related: true,
    pagination: true,
    ads: true,
    hotels: true,
    videos: true,
    thumbnails: true,
    shop: true,
    stories: true,
    locals: true,
  };
  // Options for GoogleNojsSERP and BingNojsSERP
  #DEF_OPTIONS = {
    organic: true,
    related: true,
    ads: true,
    hotels: true,
  };

Usage: In this example only organic feature will be parsed

  import { GoogleSERP } from 'serp-parser'
  const parser = new GoogleSERP(html, {organic: true});
  console.dir(parser.serp);

Roadmap

We are working on enriching parsed data to grab all existing and new SERP features from Google search page results, as well as knowlege graph, ads, and any related info. Also we will add more Search engines along the way.

Anyone willing to help - please submit issues, feature requests, fork away and send PR's

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].