Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → GoldArowana → Douyin Crawler

GoldArowana / Douyin Crawler

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

Programming Languages

java

68154 projects - #9 most used programming language

Labels

crawler vertx

Projects that are alternatives of or similar to Douyin Crawler

Beian Domain

获取最新可备案域名列表爬虫

Stars: ✭ 9 (-72.73%)

Mutual labels: crawler

Pypergrabber

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

Stars: ✭ 14 (-57.58%)

Mutual labels: crawler

Toutiaocrawler

头条号爬虫案例

Stars: ✭ 30 (-9.09%)

Mutual labels: crawler

Vertx Web

HTTP web applications for Vert.x

Stars: ✭ 853 (+2484.85%)

Mutual labels: vertx

Sina Stock Crawler

Sina stock options crawler with CSV output 新浪上证ETF期权数据爬虫

Stars: ✭ 12 (-63.64%)

Mutual labels: crawler

Scrapy Azuresearch Crawler Samples

Scrapy as a Web Crawler for Azure Search Samples

Stars: ✭ 20 (-39.39%)

Mutual labels: crawler

Pic Gather

[ Closed ] 🎨 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.

Stars: ✭ 842 (+2451.52%)

Mutual labels: crawler

Vw Crawler

🐞简单轻便的Java爬虫框架，只要会一点简单的正则表达式和简单的css选择器就能轻松的采集数据。

Stars: ✭ 32 (-3.03%)

Mutual labels: crawler

Axegrinder

Crawl websites for accessibility issues from the command line.

Stars: ✭ 12 (-63.64%)

Mutual labels: crawler

Java Vertx Web

OpenTracing instrumentation for Vert.x web package

Stars: ✭ 21 (-36.36%)

Mutual labels: vertx

Disec

Distributed Image Search Engine Crawler

Stars: ✭ 11 (-66.67%)

Mutual labels: crawler

Vertx React Example

Simple test of using Vert.x and React for server-side rendering

Stars: ✭ 11 (-66.67%)

Mutual labels: vertx

Papercrawler

Crawler used to crawl papers

Stars: ✭ 20 (-39.39%)

Mutual labels: crawler

Goods Crawling

爬取amazon/bestbuy/costco/6pm 的商品详情

Stars: ✭ 9 (-72.73%)

Mutual labels: crawler

Universityrecruitment Ssurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”？

Stars: ✭ 30 (-9.09%)

Mutual labels: crawler

Symfony Crawler Bundle

Implements the crawler package into Symfony

Stars: ✭ 8 (-75.76%)

Mutual labels: crawler

Onion Crawler

Tor website crawler (specific for Alphabay at the time)

Stars: ✭ 15 (-54.55%)

Mutual labels: crawler

Leboncoin Crawler

Crawler for leboncoin.fr

Stars: ✭ 32 (-3.03%)

Mutual labels: crawler

Autocrawler

Google, Naver multiprocess image web crawler (Selenium)

Stars: ✭ 957 (+2800%)

Mutual labels: crawler

Vertx Eventbus Java

A Vert.x EventBus client written in Java, works on Android

Stars: ✭ 20 (-39.39%)

Mutual labels: vertx

View All Similar Projects ➔

抖音爬虫

用于爬取指定用户的'作品', 以及指定用户的'喜欢'。

用手机挂代理刷抖音, 当你访问其他用户的个人空间时, 就自动把这些用户信息、头像、视频封面、视频mp4、mp3等都爬取下来了。

本项目使用语言及版本：

jdk: 14-preview (idea请使用2020.1尝鲜版, 否则不支持jdk14新语法)
python: 3.8

模块介绍：

douyin-scanner [python] 本模块用于将抖音信息以mitmdump代理形式拦截, 然后以宽表形式写入到数据库中, 方便douyin-downloader模块做后续的处理。
douyin-downloader [java] 使用vertx框架。本模块用于将爬取下来的信息做后续的分析、重组、下载。

本项目使用技术：

mitmdump + python做代理拦截
vertx作为整个项目的主要框架
裸写sql不方便, 自己实现了一个sqlBuilder, 方便拼接sql (最开始是使用的第三方依赖sqlBuilder但是这个用起来有很多不足之处, 比如不支持limit, 不支持prepare等等
自己用反射实现了一个对象关系应该工具类com/aries/crawler/tools/Orm.java , 弥补了vertx没有orm的不便利之处。美其名曰：几十行代码实现了一个orm。

为什么不用spring和mybatis

我不喜欢自己的项目里有一大堆眼花缭乱的第三方依赖(你可以看一下本项目的pom.xml, 目前只有vertx-core、vertx-mysql, 还有一个用于单元测试的junit)
不喜欢无脑使用spring和mybatis的行为. 经常见到一些java工程师打算新建个项目写点东西时, 第一件事情就是想都不想就直接引入一套spring(醒醒啊喂, 你是java工程师, 不是spring工程师)。并不是认为这些不好, 只是认为这并不是解决问题的通用方案, 更不是完美方案。
这是我第一次使用vertx, 但不是第一次不使用spring。(netty/jFinal/play/akka都是很不错的框架呀)

本项目仅供学习研究, 不提供任何反爬虫等功能, 请不要恶意爬取。恶意使用本代码者, 后果自负!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 33

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗