All Projects → ArchiveTeam → wget-lua

ArchiveTeam / wget-lua

Licence: GPL-3.0 license
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Programming Languages

c
50402 projects - #5 most used programming language
python
139335 projects - #7 most used programming language
perl
6916 projects
shell
77523 projects
M4
1887 projects

Projects that are alternatives of or similar to wget-lua

Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+29775%)
Mutual labels:  scraper, spider, scraping, crawling
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+1417.31%)
Mutual labels:  scraper, downloader, scraping, crawling
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+746.15%)
Mutual labels:  scraper, spider, scraping, crawling
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (+1.92%)
Mutual labels:  scraper, spider, scraping, crawl
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+228.85%)
Mutual labels:  scraper, spider, scraping, crawling
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+1.92%)
Mutual labels:  scraper, scraping, crawling, crawl
bots-zoo
No description or website provided.
Stars: ✭ 59 (+13.46%)
Mutual labels:  scraper, scraping, crawling
Dataflowkit
Extract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+776.92%)
Mutual labels:  scraper, scraping, crawling
fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
Stars: ✭ 97 (+86.54%)
Mutual labels:  spider, wget, crawl
Fbcrawl
A Facebook crawler
Stars: ✭ 536 (+930.77%)
Mutual labels:  scraper, spider, crawl
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+9763.46%)
Mutual labels:  scraper, scraping, crawling
zcrawl
An open source web crawling platform
Stars: ✭ 21 (-59.62%)
Mutual labels:  scraping, crawling, crawlers
Zeiver
A Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-73.08%)
Mutual labels:  scraper, downloader, scraping
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+9201.92%)
Mutual labels:  scraper, scraping, crawling
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-57.69%)
Mutual labels:  scraper, spider, scraping
Grab Site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Stars: ✭ 680 (+1207.69%)
Mutual labels:  spider, archiving, crawl
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-26.92%)
Mutual labels:  spider, scraping, crawling
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+432.69%)
Mutual labels:  spider, scraping, crawling
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+1869.23%)
Mutual labels:  scraper, spider, scraping
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+2296.15%)
Mutual labels:  scraper, spider, scraping
                                                          -*- text -*-
GNU Wget
========
                  Current Web home: https://www.gnu.org/software/wget/

GNU Wget is a free utility for non-interactive download of files from
the Web.  It supports HTTP, HTTPS, and FTP protocols, as well as
retrieval through HTTP proxies.

It can follow links in HTML pages and create local versions of remote
web sites, fully recreating the directory structure of the original
site.  This is sometimes referred to as "recursive downloading."
While doing that, Wget respects the Robot Exclusion Standard
(/robots.txt).  Wget can be instructed to convert the links in
downloaded HTML files to the local files for offline viewing.

Recursive downloading also works with FTP, where Wget can retrieve a
hierarchy of directories and files.

With both HTTP and FTP, Wget can check whether a remote file has
changed on the server since the previous run, and only download the
newer files.

Wget has been designed for robustness over slow or unstable network
connections; if a download fails due to a network problem, it will
keep retrying until the whole file has been retrieved.  If the server
supports regetting, it will instruct the server to continue the
download from where it left off.

If you are behind a firewall that requires the use of a socks style
gateway, you can get the socks library and compile wget with support
for socks.

Most of the features are configurable, either through command-line
options, or via initialization file .wgetrc.  Wget allows you to
install a global startup file (/usr/local/etc/wgetrc by default) for
site settings.

Wget works under almost all Unix variants in use today and, unlike
many of its historical predecessors, is written entirely in C, thus
requiring no additional software, such as Perl.  The external software
it does work with, such as OpenSSL, is optional.  As Wget uses the GNU
Autoconf, it is easily built on and ported to new Unix-like systems.
The installation procedure is described in the INSTALL file.

As with other GNU software, the latest version of Wget can be found at
the master GNU archive site ftp.gnu.org, and its mirrors.  Wget
resides at <ftp://ftp.gnu.org/pub/gnu/wget/>.

Please report bugs in Wget to <[email protected]>.

See the file `MAILING-LIST' for information about Wget mailing lists.
Wget's home page is at <https://www.gnu.org/software/wget/>.

If you would like to contribute code for Wget, please read
CONTRIBUTING.md.

Wget was originally written and mainained by Hrvoje Niksic.  Please see
the file AUTHORS for a list of major contributors, and the ChangeLogs
for a detailed listing of all contributions.


Copyright (C) 1995-2022 Free Software Foundation, Inc.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
USA.

Additional permission under GNU GPL version 3 section 7

If you modify this program, or any covered work, by linking or
combining it with the OpenSSL project's OpenSSL library (or a
modified version of that library), containing parts covered by the
terms of the OpenSSL or SSLeay licenses, the Free Software Foundation
grants you additional permission to convey the resulting work.
Corresponding Source for a non-source form of such a combination
shall include the source code for the parts of OpenSSL used as well
as that of the covered work.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].