ArchiveTeam / wget-lua
Licence: GPL-3.0 license
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52
Programming Languages
Labels
Projects that are alternatives of or similar to wget-lua
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+29775%)
Mutual labels: scraper, spider, scraping, crawling
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+1417.31%)
Mutual labels: scraper, downloader, scraping, crawling
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+746.15%)
Mutual labels: scraper, spider, scraping, crawling
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (+1.92%)
Mutual labels: scraper, spider, scraping, crawl
Linkedin Profile Scraper
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+228.85%)
Mutual labels: scraper, spider, scraping, crawling
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+1.92%)
Mutual labels: scraper, scraping, crawling, crawl
bots-zoo
No description or website provided.
Stars: ✭ 59 (+13.46%)
Mutual labels: scraper, scraping, crawling
Dataflowkit
Extract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+776.92%)
Mutual labels: scraper, scraping, crawling
fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
Stars: ✭ 97 (+86.54%)
Mutual labels: spider, wget, crawl
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+9763.46%)
Mutual labels: scraper, scraping, crawling
zcrawl
An open source web crawling platform
Stars: ✭ 21 (-59.62%)
Mutual labels: scraping, crawling, crawlers
Zeiver
A Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-73.08%)
Mutual labels: scraper, downloader, scraping
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-57.69%)
Mutual labels: scraper, spider, scraping
Grab Site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Stars: ✭ 680 (+1207.69%)
Mutual labels: spider, archiving, crawl
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-26.92%)
Mutual labels: spider, scraping, crawling
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+432.69%)
Mutual labels: spider, scraping, crawling
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+1869.23%)
Mutual labels: scraper, spider, scraping
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+2296.15%)
Mutual labels: scraper, spider, scraping
-*- text -*- GNU Wget ======== Current Web home: https://www.gnu.org/software/wget/ GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. It can follow links in HTML pages and create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as "recursive downloading." While doing that, Wget respects the Robot Exclusion Standard (/robots.txt). Wget can be instructed to convert the links in downloaded HTML files to the local files for offline viewing. Recursive downloading also works with FTP, where Wget can retrieve a hierarchy of directories and files. With both HTTP and FTP, Wget can check whether a remote file has changed on the server since the previous run, and only download the newer files. Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off. If you are behind a firewall that requires the use of a socks style gateway, you can get the socks library and compile wget with support for socks. Most of the features are configurable, either through command-line options, or via initialization file .wgetrc. Wget allows you to install a global startup file (/usr/local/etc/wgetrc by default) for site settings. Wget works under almost all Unix variants in use today and, unlike many of its historical predecessors, is written entirely in C, thus requiring no additional software, such as Perl. The external software it does work with, such as OpenSSL, is optional. As Wget uses the GNU Autoconf, it is easily built on and ported to new Unix-like systems. The installation procedure is described in the INSTALL file. As with other GNU software, the latest version of Wget can be found at the master GNU archive site ftp.gnu.org, and its mirrors. Wget resides at <ftp://ftp.gnu.org/pub/gnu/wget/>. Please report bugs in Wget to <[email protected]>. See the file `MAILING-LIST' for information about Wget mailing lists. Wget's home page is at <https://www.gnu.org/software/wget/>. If you would like to contribute code for Wget, please read CONTRIBUTING.md. Wget was originally written and mainained by Hrvoje Niksic. Please see the file AUTHORS for a list of major contributors, and the ChangeLogs for a detailed listing of all contributions. Copyright (C) 1995-2022 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. Additional permission under GNU GPL version 3 section 7 If you modify this program, or any covered work, by linking or combining it with the OpenSSL project's OpenSSL library (or a modified version of that library), containing parts covered by the terms of the OpenSSL or SSLeay licenses, the Free Software Foundation grants you additional permission to convey the resulting work. Corresponding Source for a non-source form of such a combination shall include the source code for the parts of OpenSSL used as well as that of the covered work.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].