All Projects → ArchiveTeam → Archivebot

ArchiveTeam / Archivebot

Licence: mit
ArchiveBot, an IRC bot for archiving websites

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
ruby
36898 projects - #4 most used programming language
haxe
709 projects

Projects that are alternatives of or similar to Archivebot

irc-docs
Collected IRC protocol documentation
Stars: ✭ 47 (-78.44%)
Mutual labels:  irc, archiving
Bitlbee Discord
Bitlbee plugin for Discord (http://discordapp.com)
Stars: ✭ 204 (-6.42%)
Mutual labels:  irc
Go Twitch Irc
go irc client for twitch.tv
Stars: ✭ 155 (-28.9%)
Mutual labels:  irc
Pdf Archiver
A tool for tagging files and archiving tasks.
Stars: ✭ 182 (-16.51%)
Mutual labels:  archiving
Wikipedia Mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kimix + ZIM dump, and MediaWiki/XOWA + XML dump
Stars: ✭ 160 (-26.61%)
Mutual labels:  archiving
Weechat
The extensible chat client.
Stars: ✭ 2,349 (+977.52%)
Mutual labels:  irc
Mojo Webqq
【重要通知:WebQQ将在2019年1月1日停止服务,此项目目前已停止维护,感谢大家四年来的一路陪伴】使用Perl语言(不会没关系)编写的smartqq/webqq客户端框架(非GUI),可通过插件提供基于HTTP协议的api接口供其他语言或系统调用
Stars: ✭ 1,755 (+705.05%)
Mutual labels:  irc
Irccloud Desktop
IRCCloud Desktop App
Stars: ✭ 215 (-1.38%)
Mutual labels:  irc
Irssi
The client of the future
Stars: ✭ 2,431 (+1015.14%)
Mutual labels:  irc
Kvirc
The KVIrc IRC Client
Stars: ✭ 179 (-17.89%)
Mutual labels:  irc
Tenyks
The Tenyks IRC bot.
Stars: ✭ 171 (-21.56%)
Mutual labels:  irc
Lisp Chat
An experimental minimal chat written in Common Lisp
Stars: ✭ 160 (-26.61%)
Mutual labels:  irc
Irc3
plugable irc client library based on python's asyncio with DCC and SASL support
Stars: ✭ 189 (-13.3%)
Mutual labels:  irc
Xchataqua
An IRC client, OS X native front-end for XChat ( http://itunes.apple.com/app/id447521961 )
Stars: ✭ 157 (-27.98%)
Mutual labels:  irc
Whapp Irc
whatsapp web <-> irc gateway
Stars: ✭ 208 (-4.59%)
Mutual labels:  irc
Znc
Official repository for the ZNC IRC bouncer
Stars: ✭ 1,851 (+749.08%)
Mutual labels:  irc
Ircdotnet
IRC.NET is a complete IRC (Internet Relay Chat) client library for .NET.
Stars: ✭ 166 (-23.85%)
Mutual labels:  irc
Tc
A desktop chat client for Twitch
Stars: ✭ 182 (-16.51%)
Mutual labels:  irc
Hexchat
GTK+ IRC client
Stars: ✭ 2,608 (+1096.33%)
Mutual labels:  irc
Twitch4j
Modular Async/Sync/Reactive Twitch API Client / IRC Client
Stars: ✭ 209 (-4.13%)
Mutual labels:  irc
  1. ArchiveBot

    Coders, I have a question. Or, a request, etc. I spent some time with xmc discussing something we could do to make things easier around here. What we came up with is a trigger for a bot, which can be triggered by people with ops. You tell it a website. It crawls it. WARC. Uploads it to archive.org. Boom. I can supply machine as needed. Obviously there's some sanitation issues, and it is root all the way down or nothing. I think that would help a lot for smaller sites Sites where it's 100 pages or 1000 pages even, pretty simple. And just being able to go "bot, get a sanity dump"

  2. More info

ArchiveBot has two major backend components: the control node, which runs the IRC interface and bookkeeping programs, and the crawlers, which do all the Web crawling. ArchiveBot users communicate with ArchiveBot by issuing commands in an IRC channel.

User's guide: http://archivebot.readthedocs.org/en/latest/ Control node installation guide: INSTALL.backend Crawler installation guide: INSTALL.pipeline

  1. Local use

ArchiveBot was originally written as a set of separate programs for deployment on a server. This means it has a poor distribution story. However, Ivan Kozik (@ivan) has taken the ArchiveBot pipeline, dashboard, ignores, and control system and created a package intended for personal use. You can find it at https://github.com/ArchiveTeam/grab-site.

  1. License

Copyright 2013 David Yip; made available under the MIT license. See LICENSE for details.

  1. Acknowledgments

Thanks to Alard (@alard), who added WARC generation and Lua scripting to GNU Wget. Wget+lua was the first web crawler used by ArchiveBot.

Thanks to Christopher Foo (@chfoo) for wpull, ArchiveBot's current web crawler.

Thanks to Ivan Kozik (@ivan) for maintaining ignore patterns and tracking down performance problems at scale.

Other thanks go to the following projects:

  1. Special thanks

Dragonette, Barnaby Bright, Vienna Teng, NONONO.

The memory hole of the Web has gone too far. Don't look down, never look away; ArchiveBot's like the wind.

vim:ts=2:sw=2:tw=72:et

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].