Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tastyminerals → Ccrawl

tastyminerals / Ccrawl

Licence: mit

Simple CORPORA list crawler

Programming Languages

python

139335 projects - #7 most used programming language

Labels

crawler

Projects that are alternatives of or similar to Ccrawl

Instagram Profilecrawl

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

Stars: ✭ 816 (+7318.18%)

Mutual labels: crawler

Mzitu

👧 美女写真套图爬虫（二）

Stars: ✭ 920 (+8263.64%)

Mutual labels: crawler

Pic Gather

[ Closed ] 🎨 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.

Stars: ✭ 842 (+7554.55%)

Mutual labels: crawler

Torbot

Dark Web OSINT Tool

Stars: ✭ 821 (+7363.64%)

Mutual labels: crawler

Finalrecon

The Last Web Recon Tool You'll Need

Stars: ✭ 888 (+7972.73%)

Mutual labels: crawler

Appcrawler

基于appium的app自动遍历工具

Stars: ✭ 925 (+8309.09%)

Mutual labels: crawler

Gospider

Gospider - Fast web spider written in Go

Stars: ✭ 785 (+7036.36%)

Mutual labels: crawler

Goods Crawling

爬取amazon/bestbuy/costco/6pm 的商品详情

Stars: ✭ 9 (-18.18%)

Mutual labels: crawler

Fscrawler

Elasticsearch File System Crawler (FS Crawler)

Stars: ✭ 906 (+8136.36%)

Mutual labels: crawler

Sqliv

massive SQL injection vulnerability scanner

Stars: ✭ 840 (+7536.36%)

Mutual labels: crawler

Python

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

Stars: ✭ 7,355 (+66763.64%)

Mutual labels: crawler

Zhihu Crawler

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

Stars: ✭ 890 (+7990.91%)

Mutual labels: crawler

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (+127.27%)

Mutual labels: crawler

Py3 scripts

Life is short, *****.

Stars: ✭ 5 (-54.55%)

Mutual labels: crawler

Symfony Crawler Bundle

Implements the crawler package into Symfony

Stars: ✭ 8 (-27.27%)

Mutual labels: crawler

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+7072.73%)

Mutual labels: crawler

Tumblthree

A Tumblr Blog Backup Application

Stars: ✭ 923 (+8290.91%)

Mutual labels: crawler

Disec

Distributed Image Search Engine Crawler

Stars: ✭ 11 (+0%)

Mutual labels: crawler

Beian Domain

获取最新可备案域名列表爬虫

Stars: ✭ 9 (-18.18%)

Mutual labels: crawler

Appcrawler

Android应用市场网络爬虫

Stars: ✭ 25 (+127.27%)

Mutual labels: crawler

View All Similar Projects ➔

ccrawl

Simple CORPORA list crawler

The CORPORA list is open for information and questions about text corpora such as availability, aspects of compiling and using corpora, software, tagging, parsing, bibliography, conferences etc. The list is also open for all types of discussion with a bearing on corpora.

CORPORA list: http://clu.uni.no/corpora/welcome.html

Subscription page: http://clu.uni.no/corpora/sub.html
Archives (October 2004 - present): http://mailman.uib.no//public/corpora/
Older archives: http://www.hit.uib.no/corpora/old.html

Screenshots:

Usage:

ccrawl is a python script and can be run simply by python2 ccrawl.py + some arguments. Before using the script you need to syncronize with the CORPORA first: python2 ccrawl --sync. Depending on your choice this operation might take seconds or up to 20 min. ccrawl will create a local copy of CORPORA .corpora_list.pickle which will be accessed each time you run the script.

To search CORPORA thread titles:

python2 ccrawl.py -f corpus

python2 ccrawl.py -f "chinese corpus"

To search CORPORA emails (available only if you performed deep sync):

python2 ccrawl.py -df corpus

python2 ccrawl.py -df "chinese corpus"

To add older archives (1995-2004):

python2 ccrawl.py -old

To see help:

python2 ccrawl.py -h

Install:

No installation needed. Make sure you have python2 installed on your system before running.

The script uses requests and beautifulsoup4 libraries.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 11

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗