Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

qzcool / Tianyancha

Licence: mit

pip安装的天眼查爬虫API，指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.

Programming Languages

python

139335 projects - #7 most used programming language

python3

1442 projects

Labels

data crawler pandas selenium scraper china business

Projects that are alternatives of or similar to Tianyancha

bots-zoo

No description or website provided.

Stars: ✭ 59 (-71.36%)

Mutual labels: crawler, scraper, selenium

Crawler China Mainland Universities

中国大陆大学列表爬虫

Stars: ✭ 143 (-30.58%)

Mutual labels: china, crawler, data

Instagram-Scraper-2021

Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).

Stars: ✭ 57 (-72.33%)

Mutual labels: data, scraper, selenium

Instagram-Comments-Scraper

Instagram comment scraper using python and selenium. Save the comments into excel.

Stars: ✭ 73 (-64.56%)

Mutual labels: scraper, selenium, pandas

Finance Go

📊 Financial markets data library implemented in go.

Stars: ✭ 392 (+90.29%)

Mutual labels: pandas, scraper, data

Python3 Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Stars: ✭ 2,129 (+933.5%)

Mutual labels: crawler, selenium

Instagram Scraper

scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot

Stars: ✭ 2,209 (+972.33%)

Mutual labels: crawler, scraper

Holiday Cn

📅🇨🇳 中国法定节假日数据自动每日抓取国务院公告

Stars: ✭ 157 (-23.79%)

Mutual labels: china, data

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (-16.99%)

Mutual labels: crawler, scraper

Google Play Scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

Stars: ✭ 143 (-30.58%)

Mutual labels: crawler, scraper

Datmusic Api

Alternative for VK Audio API

Stars: ✭ 160 (-22.33%)

Mutual labels: crawler, scraper

Colly

Elegant Scraper and Crawler Framework for Golang

Stars: ✭ 15,535 (+7441.26%)

Mutual labels: crawler, scraper

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (-28.64%)

Mutual labels: pandas, data

Youtube Projects

This repository contains all the code I use in my YouTube tutorials.

Stars: ✭ 144 (-30.1%)

Mutual labels: crawler, scraper

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (+1061.17%)

Mutual labels: crawler, scraper

Pandas Datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

Stars: ✭ 2,183 (+959.71%)

Mutual labels: pandas, data

Crawler illegal cases in china

Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律，避免触碰数据合规红线。 [AD]中文知识图谱门户

Stars: ✭ 2,448 (+1088.35%)

Mutual labels: china, crawler

Instagram Crawler

Crawl instagram photos, posts and videos for download.

Stars: ✭ 178 (-13.59%)

Mutual labels: crawler, scraper

Zhihu fun

基于 Selenium 的知乎关键词爬虫

Stars: ✭ 185 (-10.19%)

Mutual labels: crawler, selenium

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (-7.77%)

Mutual labels: crawler, scraper

View All Similar Projects ➔

Tianyancha 天眼查

输入目标企业的模糊名称/简称，一行代码将目标企业的制定工商信息分类保存为Excel/JSON文件。

模拟登录：基于Selenium的Xpath来定位登录框并传入个人账户信息,一次登录大概6-9秒。
关键字的模糊识别：利用天眼查搜索框的已有模糊检索能力，方便用户仅能提供部分关键字的情况。
元素定位：特殊表格（比如'baseInfo'）使用了Selenium提供的API，具体请参考Locating Elements。一般表格使用pandas的read_html方法。

What is Tianyancha? Read this article to find out.

下载安装 Installation

pip install tianyancha

使用方法 Instruction

输入更换为自己的天眼查账户、密码和查询关键字。 生成的结果文件请参考北京鸿智慧通实业有限公司.xlsx和中信证券股份有限公司.json。

运行下面的示例代码将分别执行：

单个：“用户User输入密码Password登录后，爬取关键字为Keyword的企业的工商信息(baseInfo)，结果返回table_dict并保存为JSON文件。”
批量：“用户User输入密码Password登录后，程序根据input.xlsx中分别设置的表名来批量爬取多个公司信息，结果返回在由多个table_dict拼接而成的tuple_dicts并分别保存为EXCEL文件。最后，在终端打印出第一个公司的所需信息。”

from tianyancha import Tianyancha
# 单个
table_dict = Tianyancha(username='User', password='Password').tianyancha_scraper(keyword='Keyword', table='baseInfo', export='json')
# 批量
tuple_dicts = Tianyancha(username='User', password='Password').tianyancha_scraper_batch(input_template='input.xlsx', export='xlsx')
tuple_dicts[0]

函数参数 Function Parameters

Tianyancha.tianyancha_scraper(keyword, table='all', use_default_exception=True, change_page_interval=2, export='xlsx'):

参数	类型	说明	范例
keyword	string	公司名称，支持模糊或部分检索。	"北京鸿智慧通实业有限公司"
table	list or string, default 'all'	需要爬取的表格信息。和官方的元素名称一致，具体请参考表格名称中英文对照表。	['baseInfo', 'staff', 'invest']
use_default_exception	boolean, default True	是否使用默认的排除列表。以忽略低价值表格为代价来加快爬取速度。	False
change_page_interval	float, default 2	爬取多页的时间间隔(秒)。避免频率过快IP地址被官方封禁。	1.5
export	string, default 'xlsx'	输出保存格式，'xlsx'/'json'。	'json'

表格参数对照表 Table Parameters Mapping Chart

参数结尾有"*"的为可能有误的参数名称，请手工复查div._container_后面的内容。

历史信息 Past
	名称	参数	说明
上市信息 Listed information	股票行情	volatilityNum
	企业简介	stockNum
	高管信息	seniorPeople
	参股控股	holdingCompany
	上市公告	announcement
	十大股东	topTenNum
	十大流通	tenTradableNum
	发行相关	issuanceRelatedNum
	股本结构	shareStructure
	股本变动	equityChange
	分红情况	bonus
	配股情况	allotment
公司背景 Company background	工商信息	baseInfo	企业基础工商信息，包含统一社会信用代码/注册资本/注册日期/法定代表人/经营范围等信息。
	天眼风险	riskInfo
	股权穿透图	graphTreeInfo
	主要人员	staff
	股东信息	holder
	对外投资	invest
	最终受益人	humanholding
	实际控制权	companyholding
	财务简析	financialAnalysis*	付费可见内容。
	企业关系	graph
	变更记录	changeinfo
	历史沿革	graphTimeInfo
	公司年报	report*
	分支机构	branch
司法风险 Judicial risk	开庭公告	announcementCount
	法律诉讼	lawsuit
	法院公告	court
	失信人信息	dishonest
	被执行人	zhixing
	司法协助
经营风险 Operational risks	经营异常	abnormal
	行政处罚	punish, punishmentCreditchina
	严重违法
	股权出质	equity
	动产抵押
	欠税公告
	司法拍卖	judicialSale
	清算信息
	知识产权出质
	公示催告	publicnoticeItem
公司发展 Company development	融资历史	rongzi
	核心团队	teamMember
	企业业务	firmProduct
	投资事件	touzi
	竞品信息	jingpin
经营状况 Operation status	招聘信息	recruit
	行政许可	licensing licensingXyzg
	税务评级	taxcredit
	抽查检查	check
	资质证书	certificate
	招投标信息	bid
	产品信息	product
	微信公众号	wechat
	进出口信用	importAndExport
	债券信息	bond
	购地信息	purchaselandV2
	电信许可	permission
知识产权 Intellectual property	商标信息	tminfo
	专利信息	patent
	软件著作权	copyright
	作品著作权	copyrightWorks
	网站备案	icp
	工商信息	pastICCount
	股东信息	pastHolderCount
	对外投资	pastInvestCount
	开庭公告	pastAnnouncementCount
	法律诉讼	passtLawsuitCount
	法院公告	pastCourtCount
	失信人信息	pastDishonest
	被执行人	pastZhixing
	行政处罚	pastPunishmentIC, pastPunishmentCreditCN
	股权出质	pastEquitycount
	动产抵押
	行政许可	getPastLicenseCN

默认排除列表 Default Exception List

use_default_exception参数的解释。

list_exception = ['recruit', 'tmInfo', 'holdingCompany', 'invest', 'bonus', 'firmProduct', 'jingpin', \
                'bid', 'taxcredit', 'certificate', 'patent', 'copyright', 'product', 'importAndExport', \
                'copyrightWorks', 'wechat', 'icp', 'announcementcourt', 'lawsuit', 'court', \
                'branch', 'touzi', 'judicialSale', 'bond', 'teamMember', 'check']

运行依赖 Dependencies

Chrome浏览器
Chrome-webdriver：将chromedriver.exe(Windows)或chromedriver.dmg(Mac)移动到本地Python安装目录下。
1. 百度网盘下载
2. 官方下载(需要代理访问)
Requirements.txt

捐助 Donation

捐助是一种美德。 ❤️💛💙

资金

请为知乎相关问题像天眼查这种网站怎么进行全爬虫？的回答点赞，帮助更多人受惠于本项目。

新功能推荐 New Features

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 206

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (18) 🔗