🌏🌍🌎Translators🌎🌍🌏 is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python. Translators是一个旨在用Python为个人和学生带来免费、多样、愉快翻译的库。

Stars: ✭ 295 (+953.57%)

Mutual labels: bing, baidu

Sitedorks

Search Google/Bing/Ecosia/DuckDuckGo/Yandex/Yahoo for a search term with a default set of websites, bug bounty programs or a custom collection.

Stars: ✭ 221 (+689.29%)

Mutual labels: bing, baidu

ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

Stars: ✭ 34 (+21.43%)

Mutual labels: scrapy

scrapy-kafka-redis

Distributed crawling/scraping, Kafka And Redis based components for Scrapy

Stars: ✭ 45 (+60.71%)

Mutual labels: scrapy

ty-baidu-textcensor

🗑在Typecho中加入百度文本内容审核，过滤评论中的敏感内容

Stars: ✭ 42 (+50%)

Mutual labels: baidu

Scrape-Finance-Data

My code for scraping financial data in Vietnam

Stars: ✭ 13 (-53.57%)

Mutual labels: scrapy

bing-daily-photo

A simple PHP class to fetch Bing's photo of the day.

Stars: ✭ 34 (+21.43%)

Mutual labels: bing

Raspagem-de-dados-para-iniciantes

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

Stars: ✭ 113 (+303.57%)

Mutual labels: scrapy

Inventus

Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.

Stars: ✭ 80 (+185.71%)

Mutual labels: scrapy

scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 92 (+228.57%)

Mutual labels: scrapy

bing-wallpaper

Python Skript that sets the daily www.bing.com picture as a Desktop Wallpaper

Stars: ✭ 21 (-25%)

Mutual labels: bing

mpapi

🐤 小程序API兼容插件，一次编写，多端运行。支持：微信小程序、支付宝小程序、百度智能小程序、字节跳动小程序

Stars: ✭ 40 (+42.86%)

Mutual labels: baidu

easypoi

简单、免费、高效的百度地图poi采集和分析工具。

Stars: ✭ 87 (+210.71%)

Mutual labels: scrapy

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+339.29%)

Mutual labels: scrapy

scrapy-mysql-pipeline

scrapy mysql pipeline

Stars: ✭ 47 (+67.86%)

Mutual labels: scrapy

hupu spider

虎扑步行街爬虫

Stars: ✭ 22 (-21.43%)

Mutual labels: scrapy

View All Similar Projects ➔

seCrawler(Search Engine Crawler)

A scrapy project can crawl search result of Google/Bing/Baidu

Copying by https://github.com/xtt129/seCrawler

Thank you for sharing

prerequisite

python 3.6 and scrapy is needed.

commands

run one command to get 50 pages result from search engine with keyword, the result would be kept in the "urls.txt" under the current directory.

####Bing scrapy crawl keywordSpider -a keyword=Spider-Man -a se=bing -a pages=50

####Baidu scrapy crawl keywordSpider -a keyword=Spider-Man -a se=baidu -a pages=50

####Google scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50

limitation

The project doesn't provide any workaround to the anti-spider measure like CAPTCHA, IP ban list, etc.

But to reduce these measures, we recommand to set DOWNLOAD_DELAY=10 in settings.py file to add a temporisation (in second) between the crawl of two pages, see details in Scrapy Setting.

Chinese

本项目用于bing、google、baidu搜索引擎关键词的抓链，基于python 3.6和scrapy。

根据 https://github.com/xtt129/seCrawler 提供的项目进行小小改动以适应3.6版本。

使用方法： ---进入项目目录执行指令---

Bing：

scrapy crawl keywordSpider -a keyword=Spider-Man -a se=bing -a pages=50

Baidu：

scrapy crawl keywordSpider -a keyword=Spider-Man -a se=baidu -a pages=50

Google：

scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50

本项目没有保护IP的功能，过度爬取可能会被封杀IP，可以尝试延长下载时间间隔：在settings.py中进行配置，例：DOWNLOAD_DELAY=10

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hanxweb / Scrapy-SearchEngines

Programming Languages

Labels

Projects that are alternatives of or similar to Scrapy-SearchEngines

seCrawler(Search Engine Crawler)

prerequisite

commands

limitation

Chinese