All Projects → howie6879 → Magic_google

howie6879 / Magic_google

Google search results crawler, get google search results that you need

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Magic google

Ok ip proxy pool
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196 (-20.65%)
Mutual labels:  crawler, spider
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+6189.47%)
Mutual labels:  crawler, spider
Zhihuspider
多线程知乎用户爬虫,基于python3
Stars: ✭ 201 (-18.62%)
Mutual labels:  crawler, spider
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (-23.08%)
Mutual labels:  crawler, spider
Chromium for spider
dynamic crawler for web vulnerability scanner
Stars: ✭ 220 (-10.93%)
Mutual labels:  crawler, spider
Google Group Crawler
Get (almost) original messages from google group archives. Your data is yours.
Stars: ✭ 190 (-23.08%)
Mutual labels:  google, crawler
Jssoup
JavaScript + BeautifulSoup = JSSoup
Stars: ✭ 203 (-17.81%)
Mutual labels:  crawler, spider
Ncov2019 data crawler
疫情数据爬虫,2019新型冠状病毒数据仓库,轨迹数据,同乘数据,报道
Stars: ✭ 175 (-29.15%)
Mutual labels:  crawler, spider
Sitemap Generator Cli
Creates an XML-Sitemap by crawling a given site.
Stars: ✭ 214 (-13.36%)
Mutual labels:  google, crawler
Jd mask robot
京东口罩库存监控爬虫(非selenium),扫码登录、查价、加购、下单、秒杀
Stars: ✭ 216 (-12.55%)
Mutual labels:  crawler, spider
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (-24.7%)
Mutual labels:  crawler, spider
Ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (-4.05%)
Mutual labels:  crawler, spider
Lianjia Beike Spider
链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 ,点星支持,仅供学习参考,请勿用于商业用途,后果自负。
Stars: ✭ 2,257 (+813.77%)
Mutual labels:  crawler, spider
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (-21.05%)
Mutual labels:  crawler, spider
Zhihu Crawler People
A simple distributed crawler for zhihu && data analysis
Stars: ✭ 182 (-26.32%)
Mutual labels:  crawler, spider
Querylist
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Stars: ✭ 2,392 (+868.42%)
Mutual labels:  crawler, spider
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-30.77%)
Mutual labels:  crawler, spider
Spoon
🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (-29.96%)
Mutual labels:  crawler, spider
Webvideobot
Web crawler.
Stars: ✭ 214 (-13.36%)
Mutual labels:  crawler, spider
Laravel Crawler Detect
A Laravel wrapper for CrawlerDetect - the web crawler detection library
Stars: ✭ 227 (-8.1%)
Mutual labels:  crawler, spider

magic_google

1.What's magic_google

This is an easy Google Searching crawler that you can get anything you want in the page by using it.

During the process of  crawling,you need to pay attention to the limitation from google towards ip address and the warning of exception , so I suggest that you should pause running the program and own the Proxy ip

php - MagicGoogle

2.How to Use?

Run

pip install magic_google
# Or
pip install git+https://github.com/howie6879/magic_google.git
# Or
git clone https://github.com/howie6879/magic_google.git
cd magic_google
vim google_search.py
# Or 
python setup.py install

Example

from magic_google import MagicGoogle
import pprint

# Or PROXIES = None
PROXIES = [{
    'http': 'http://192.168.2.207:1080',
    'https': 'http://192.168.2.207:1080'
}]

# Or MagicGoogle()
mg = MagicGoogle(PROXIES)

#  Crawling the whole page
result = mg.search_page(query='python')

# Crawling url
for url in mg.search_url(query='python'):
    pprint.pprint(url)
    
# Output
# 'https://www.python.org/'
# 'https://www.python.org/downloads/'
# 'https://www.python.org/about/gettingstarted/'
# 'https://docs.python.org/2/tutorial/'
# 'https://docs.python.org/'
# 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# 'https://www.codecademy.com/courses/introduction-to-python-6WeG3/0?curriculum_id=4f89dab3d788890003000096'
# 'https://www.codecademy.com/learn/python'
# 'https://developers.google.com/edu/python/'
# 'https://learnpythonthehardway.org/book/'
# 'https://www.continuum.io/downloads'

# Get {'title','url','text'}
for i in mg.search(query='python', num=1):
    pprint.pprint(i)
    
# Output
# {'text': 'The official home of the Python Programming Language.',
# 'title': 'Welcome to Python .org',
# 'url': 'https://www.python.org/'}

You can see google_search.py

If  you need a big amount of querie but only having an ip address,I suggest  you can have a time lapse between 5s ~ 30s.

The reason that it always return empty might be as follows:

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://ipv4.google.com/sorry/index?continue=https://www.google.me/s****">here</A>.
</BODY></HTML>
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].