稳健高效的评分制-针对性- IP代理池 + API服务，可以自己插入采集器进行代理IP的爬取，针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库，支持MongoDB 4.0 使用 Python3.7（Scored IP proxy pool ,customise proxy data crawler can be added anytime）

Stars: ✭ 195 (+1400%)

Mutual labels: crawler, asyncio

duckpy

A simple Python library for searching on DuckDuckGo.

Stars: ✭ 20 (+53.85%)

Mutual labels: asyncio, httpx

prisma-client-py

Prisma Client Python is an auto-generated and fully type-safe database client designed for ease of use

Stars: ✭ 739 (+5584.62%)

Mutual labels: asyncio, python38

sse-option-crawler

SSE 50 index options crawler 上证50期权数据爬虫

Stars: ✭ 17 (+30.77%)

Mutual labels: crawler

TaobaoAnalysis

练习NLP，分析淘宝评论的项目

Stars: ✭ 28 (+115.38%)

Mutual labels: crawler

aiodogstatsd

An asyncio-based client for sending metrics to StatsD with support of DogStatsD extension

Stars: ✭ 26 (+100%)

Mutual labels: asyncio

social.ui

Basic UI for typical social network application

Stars: ✭ 46 (+253.85%)

Mutual labels: social-network

aiolimiter

An efficient implementation of a rate limiter for asyncio.

Stars: ✭ 121 (+830.77%)

Mutual labels: asyncio

automate-home

Yet another python home automation (iot) project. Because a smart light is more than just on or off.

Stars: ✭ 59 (+353.85%)

Mutual labels: asyncio

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (+15.38%)

Mutual labels: crawler

sanic-url-shortener

Example of how to use Sanic and asyncpg (PostgreSQL)

Stars: ✭ 16 (+23.08%)

Mutual labels: asyncio

Python3Webcrawler

🌈Python3网络爬虫实战：QQ音乐歌曲、京东商品信息、房天下、破解有道翻译、构建代理池、豆瓣读书、百度图片、破解网易登录、B站模拟扫码登录、小鹅通、荔枝微课

Stars: ✭ 208 (+1500%)

Mutual labels: crawler

View All Similar Projects ➔

Sharingan

We will try to find your visible basic footprint from social media as much as possible

中文版: Readme_cn

Environmental

First, ensure that you have installed the python3.8+ , and then run the following commands.

git clone https://github.com/aoii103/Sharingan.git

cd sharingan

python3 setup.py install

or via pip

pip install sharingan

Usage

python3 -m sharingan blue

Add New Targets

I have considered using JSON as the site's configuration file, but later wrote it in extract.py

And what we need to do is add the following method under class Extractor, where the def upload method stores the basic configuration of the corresponding site

For optional configurations, see models.py

    @staticmethod
    def __example() -> Generator:
        """
            1. <-- yield your config first
            2. --> then got your datas back
            3. <-- finally, yield the extracted data back
        """
        T = yield from upload(
            **{
                "url": "http://xxxx",
            }
        )

        T.name = T.html.pq('title').text()
        ...

        yield T

Singel Test

Sometimes we need to test for a new site

And we can use the following code . for example, when the target is twitter

python3 -m sharingan larry --singel=twitter

Create sites from sherlock

run the following command first

python3 -m sharingan.common

and it will create a python file named templates.py

    @staticmethod
    def site_2Dimensions():
        T = yield from upload(url='''https://2Dimensions.com/a/{}''',)

        T.title = T.html.pq('title').text()
        yield T

    @staticmethod
    def site_3dnews():
        T = yield from upload(url='''http://forum.3dnews.ru/member.php?username={}''',error_type='text',error_msg='''Пользователь не зарегистрирован и не имеет профиля для просмотра.''',)

        T.title = T.html.pq('title').text()
        yield T

    ...

then replace them into extract.py

Options


Usage: __main__.py [OPTIONS] NAME

Options:
  --name TEXT        The username you need to search
  --proxy_uri TEXT   Proxy address in case of need to use a proxy to be used
  --no_proxy         All connections will be directly connected
  --save_path TEXT   The storage location of the collected results
  --pass_history     The file name will be named according to the scan endtime
  --singel TEXT      Commonly used for single target information acquisition or testing
  --debug            Debug model
  --update           Do not overwrite the original data results
  --workers INTEGER  Number of concurrent workers
  --help             Show this message and exit.

TODO

Formatted output

📝 License

This project is MIT licensed.

If you think this script is useful to you, don't forget star 🐶. Inspired by ❤️ sherlock

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

s045pd / Sharingan

Programming Languages

Labels

Projects that are alternatives of or similar to Sharingan

Sharingan

Environmental

Usage

Add New Targets

Singel Test

Create sites from sherlock

Options

TODO

📝 License