Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-34.78%)

Mutual labels: crawler

Ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

Stars: ✭ 237 (+930.43%)

Mutual labels: crawler

arachnod

High performance crawler for Nodejs

Stars: ✭ 17 (-26.09%)

Mutual labels: crawler

serp-parser

Nodejs lib to parse Google SERP html pages

Stars: ✭ 28 (+21.74%)

Mutual labels: google-search

crawler

A simple and flexible web crawler framework for java.

Stars: ✭ 20 (-13.04%)

Mutual labels: crawler

Polite

Be nice on the web

Stars: ✭ 253 (+1000%)

Mutual labels: crawler

ublacklist

Blocks specific sites from appearing in Google search results

Stars: ✭ 3,726 (+16100%)

Mutual labels: google-search

TaobaoAnalysis

练习NLP，分析淘宝评论的项目

Stars: ✭ 28 (+21.74%)

Mutual labels: crawler

Weibopicdownloader

免登录下载微博图片爬虫 Download Weibo Images without Logging-in

Stars: ✭ 247 (+973.91%)

Mutual labels: crawler

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (+108.7%)

Mutual labels: crawler

Strong Web Crawler

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

Stars: ✭ 238 (+934.78%)

Mutual labels: crawler

Python3Webcrawler

🌈Python3网络爬虫实战：QQ音乐歌曲、京东商品信息、房天下、破解有道翻译、构建代理池、豆瓣读书、百度图片、破解网易登录、B站模拟扫码登录、小鹅通、荔枝微课

Stars: ✭ 208 (+804.35%)

Mutual labels: crawler

Sharingan

We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon

Stars: ✭ 13 (-43.48%)

Mutual labels: crawler

sse-option-crawler

SSE 50 index options crawler 上证50期权数据爬虫

Stars: ✭ 17 (-26.09%)

Mutual labels: crawler

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

Stars: ✭ 15 (-34.78%)

Mutual labels: crawler

View All Similar Projects ➔

php-google

This is an easy Google Searching crawler that you can get anything you want in the page by using it.

During the process of crawling,you need to pay attention to the limitation from google towards ip address and the warning of exception , so I suggest that you should pause running the program and own the Proxy ip

python - MagicGoogle

2.How to Use?

This project can be installed via composer by requiring the howie6879/php-google package in composer.json:

{
    "require": {
        "howie6879/php-google": "1.0"
    }
}

If you have installed php-google in your project, you can get google search results that you need.

Example

# Add boostrap autoload file

require_once '../vendor/autoload.php';
use \howie6879\PhpGoogle\MagicGoogle;

# Or new MagicGoogle()
$magicGoogle = new MagicGoogle('http://127.0.0.1:8118');

# The first page of results
$data = $magicGoogle->search_page('python');

# Get url
$data = $magicGoogle->search_url('python');

foreach ($data as $value) {
    var_dump($value);
}

/** Output
 * string(23) "https://www.python.org/"
 * string(33) "https://www.python.org/downloads/"
 * string(35) "https://docs.python.org/3/tutorial/"
 * string(44) "https://www.python.org/about/gettingstarted/"
 * string(43) "https://wiki.python.org/moin/BeginnersGuide"
 * string(41) "https://www.python.org/downloads/windows/"
 * string(24) "https://docs.python.org/"
 * string(59) "https://en.wikipedia.org/wiki/Python_(programming_language)"
 * string(39) "https://www.codecademy.com/learn/python"
 * string(25) "https://github.com/python"
 * string(38) "https://www.tutorialspoint.com/python/"
 * string(28) "https://www.learnpython.org/"
 * string(44) "https://www.programiz.com/python-programming"
 */
 
# Get {'title','url','text'}
$data = $magicGoogle->search('python', 'en', '1');

foreach ($data as $value) {
    var_dump($value);
}

/** Output
 * array(3) {
 * ["title"]=>
 * string(21) "Welcome to Python.org"
 * ["url"]=>
 * string(23) "https://www.python.org/"
 * ["text"]=>
 * string(54) "The official home of the Python Programming Language. "
 * }
 */

You can see sample.php

If you need a big amount of querie but only having an ip address,I suggest you can have a time lapse between 5s ~ 30s.

The reason that it always return empty might be as follows:

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://ipv4.google.com/sorry/index?continue=https://www.google.me/s****">here</A>.
</BODY></HTML>

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

howie6879 / php-google

Programming Languages

Labels

Projects that are alternatives of or similar to php-google

php-google

2.How to Use?