Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → bisoncorps → Search Engine Parser

bisoncorps / Search Engine Parser

Licence: mit

Lightweight package to query popular search engines and scrape for result titles, links and descriptions

Programming Languages

python

139335 projects - #7 most used programming language

Labels

cli library google search search-engine pypi anime scraping coursera bing

Projects that are alternatives of or similar to Search Engine Parser

Best Of Python

🏆 A ranked list of awesome Python open-source libraries and tools. Updated weekly.

Stars: ✭ 1,869 (+765.28%)

Mutual labels: cli, library, pypi

Sitedorks

Search Google/Bing/Ecosia/DuckDuckGo/Yandex/Yahoo for a search term with a default set of websites, bug bounty programs or a custom collection.

Stars: ✭ 221 (+2.31%)

Mutual labels: bing, google, search

Jsearch

jSearch(聚搜) 是一款专注内容的chrome搜索扩展，一次搜索聚合多平台内容。

Stars: ✭ 193 (-10.65%)

Mutual labels: bing, google, search

Search Deflector

A small program that forwards searches from Cortana to your preferred browser and search engine.

Stars: ✭ 620 (+187.04%)

Mutual labels: bing, google, search

Idt

Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.

Stars: ✭ 202 (-6.48%)

Mutual labels: bing, search-engine, scraping

Xinahn Socket

一个开源，高隐私，自架自用的聚合搜索引擎。 https://xinahn.com

Stars: ✭ 77 (-64.35%)

Mutual labels: bing, google, search-engine

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+2139.35%)

Mutual labels: cli, scraping, library

Mal

MAL: A MyAnimeList Command Line Interface [BROKEN: BLAME MyAnimeList]

Stars: ✭ 104 (-51.85%)

Mutual labels: cli, anime, pypi

Search Engine Google

🕷 Google client for SERPS

Stars: ✭ 138 (-36.11%)

Mutual labels: google, search-engine, scraping

Youtubeshop

Youtube autolike and autosubs script

Stars: ✭ 177 (-18.06%)

Mutual labels: cli, google

Airanime

轻量化动漫聚合搜索工具

Stars: ✭ 184 (-14.81%)

Mutual labels: search-engine, anime

Anime Dl

Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.

Stars: ✭ 190 (-12.04%)

Mutual labels: scraping, anime

Rusticsearch

Lightweight Elasticsearch compatible search server.

Stars: ✭ 171 (-20.83%)

Mutual labels: search, search-engine

BitTorrent library and client with DHT, magnet links, encryption and more

Stars: ✭ 2,011 (+831.02%)

Mutual labels: cli, library

Autoserver

Create a full-featured REST/GraphQL API from a configuration file

Stars: ✭ 188 (-12.96%)

Mutual labels: cli, library

Passw0rd

🔑securely checks a password to see if it has been previously exposed in a data breach

Stars: ✭ 159 (-26.39%)

Mutual labels: cli, library

Lolcate Rs

Lolcate -- A comically fast way of indexing and querying your filesystem. Replaces locate / mlocate / updatedb. Written in Rust.

Stars: ✭ 191 (-11.57%)

Mutual labels: search, search-engine

Serpscrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.

Stars: ✭ 153 (-29.17%)

Mutual labels: search, scraping

Rpi Backlight

🔆 A Python module for controlling power and brightness of the official Raspberry Pi 7" touch display

Stars: ✭ 190 (-12.04%)

Mutual labels: cli, library

Vectorai

Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Stars: ✭ 195 (-9.72%)

Mutual labels: search, search-engine

View All Similar Projects ➔

Search Engine Parser

"If it is a search engine, then it can be parsed" - some random guy

search-engine-parser is a package that lets you query popular search engines and scrape for result titles, links, descriptions and more. It aims to scrape the widest range of search engines. View all supported engines here.

Search Engine Parser

Popular Supported Engines

Popular search engines supported include:

Google
DuckDuckGo
GitHub
StackOverflow
Baidu
YouTube

View all supported engines here.

Installation

Install from PyPi:

    # install only package dependencies
    pip install search-engine-parser
    # Installs `pysearch` cli  tool
    pip install "search-engine-parser[cli]"

or from master:

  pip install git+https://github.com/bisoncorps/search-engine-parser

Development

Clone the repository:

    git clone [email protected]:bisoncorps/search-engine-parser.git

Then create a virtual environment and install the required packages:

    mkvirtualenv search_engine_parser
    pip install -r requirements/dev.txt

Code Documentation

Code docs can be found on Read the Docs.

Running the tests

    pytest

Usage

Code

Query results can be scraped from popular search engines, as shown in the example snippet below.

  import pprint

  from search_engine_parser.core.engines.bing import Search as BingSearch
  from search_engine_parser.core.engines.google import Search as GoogleSearch
  from search_engine_parser.core.engines.yahoo import Search as YahooSearch

  search_args = ('preaching to the choir', 1)
  gsearch = GoogleSearch()
  ysearch = YahooSearch()
  bsearch = BingSearch()
  gresults = gsearch.search(*search_args)
  yresults = ysearch.search(*search_args)
  bresults = bsearch.search(*search_args)
  a = {
      "Google": gresults,
      "Yahoo": yresults,
      "Bing": bresults
      }

  # pretty print the result from each engine
  for k, v in a.items():
      print(f"-------------{k}------------")
      for result in v:
          pprint.pprint(result)

  # print first title from google search
  print(gresults["titles"][0])
  # print 10th link from yahoo search
  print(yresults["links"][9])
  # print 6th description from bing search
  print(bresults["descriptions"][5])

  # print first result containing links, descriptions and title
  print(gresults[0])

For localization, you can pass the url keyword and a localized url. This queries and parses the localized url using the same engine's parser:

  # Use google.de instead of google.com
  results = gsearch.search(*search_args, url="google.de")

If you need results in a specific language you can pass the 'hl' keyword and the 2-letter country abbreviation (here's a handy list):

  # Use 'it' to receive italian results
  results = gsearch.search(*search_args, hl="it")

Cache

The results are automatically cached for engine searches. You can either bypass the cache by adding cache=False to the search or async_search method or clear the engine's cache

    from search_engine_parser.core.engines.github import Search as GitHub
    github = GitHub()
    # bypass the cache
    github.search("search-engine-parser", cache=False)

    #OR
    # clear cache before search
    github.clear_cache()
    github.search("search-engine-parser")

Proxy

Adding a proxy entails sending details to the search function

    from search_engine_parser.core.engines.github import Search as GitHub
    github = GitHub()
    github.search("search-engine-parser", 
        # http proxies supported only
        proxy='http://123.12.1.0',
        proxy_auth=('username', 'password'))

Async

search-engine-parser supports async:

   results = await gsearch.async_search(*search_args)

Results

The SearchResults after searching:

  >>> results = gsearch.search("preaching to the choir", 1)
  >>> results
  <search_engine_parser.core.base.SearchResult object at 0x7f907426a280>
  # the object supports retrieving individual results by iteration of just by type (links, descriptions, titles)
  >>> results[0] # returns the first <SearchItem>
  >>> results[0]["description"] # gets the description of the first item
  >>> results[0]["link"] # gets the link of the first item
  >>> results["descriptions"] # returns a list of all descriptions from all results

It can be iterated like a normal list to return individual SearchItems.

Command line

search-engine-parser comes with a CLI tool known as pysearch. You can use it as such:

pysearch --engine bing search --query "Preaching to the choir" --type descriptions

Result:

'Preaching to the choir' originated in the USA in the 1970s. It is a variant of the earlier 'preaching to the converted', which dates from England in the late 1800s and has the same meaning. Origin - the full story 'Preaching to the choir' (also sometimes spelled quire) is of US origin.

There is a needed argument for the CLI i.e -e Engine followed by either of two subcommands in the CLI i.e search and summary

usage: pysearch [-h] [-u URL] [-e ENGINE] {search,summary} ...

SearchEngineParser

positional arguments:
  {search,summary}      help for subcommands
    search              search help
    summary             summary help

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     A custom link to use as base url for search e.g
                        google.de
  -e ENGINE, --engine ENGINE
                        Engine to use for parsing the query e.g google, yahoo,
                        bing, duckduckgo (default: google)

summary returns the summary of the specified search engine:

pysearch --engine google summary

Full arguments for the search subcommand shown below

usage: pysearch search [-h] -q QUERY [-p PAGE] [-t TYPE] [-r RANK]

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        Query string to search engine for
  -p PAGE, --page PAGE  Page of the result to return details for (default: 1)
  -t TYPE, --type TYPE  Type of detail to return i.e full, links, desciptions
                        or titles (default: full)
  -r RANK, --rank RANK  ID of Detail to return e.g 5 (default: 0)
  -cc, --clear_cache    Clear cache of engine before searching

Code of Conduct

Make sure to adhere to the code of conduct at all times.

Contribution

Before making any contributions, please read the contribution guide.

License (MIT)

This project is licensed under the MIT 2.0 License which allows very broad use for both academic and commercial purposes.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Ed Luff} 💻	_{Diretnan Domnan} 🚇 ⚠️ 🔧 💻	_MeNsaaH 🚇 ⚠️ 🔧 💻	_{Aditya Pal} ⚠️ 💻 📖	_{Avinash Reddy} 🐛	_{David Onuh} 💻 ⚠️	_{Panagiotis Simakis} 💻 ⚠️
_reiarthur 💻	_{Ashokkumar TA} 💻	_{Andreas Teuber} 💻	_mi096684 🐛	_devajithvs 💻	_{Geg Zakaryan} 💻 🐛	_{Hakan Boğan} 🐛
_NicKoehler 🐛 💻	_ChrisLin 🐛 💻	_Pietro 💻 🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 216

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗