All Projects → albertz → RandomFtpGrabber

albertz / RandomFtpGrabber

Licence: MIT license
Random FTP grabber - downloads all the interesting stuff

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to RandomFtpGrabber

Vscode Remote Workspace
Multi protocol support for handling remote files like local ones in Visual Studio Code.
Stars: ✭ 197 (+258.18%)
Mutual labels:  ftp
Autoarchive
一个基于Jenkins的iOS/Android自动构建系统,它实现了最大程度的自动化,让你的iOS自动打包,Android自动打包流程变得更加高效。此项目包含了各种实现细节的讲解说明,你能够使用它解决大多数跟客户端构建/分发相关的问题,并将这种能力进行开放,提高研发效率。
Stars: ✭ 248 (+350.91%)
Mutual labels:  ftp
kafka-connect-fs
Kafka Connect FileSystem Connector
Stars: ✭ 107 (+94.55%)
Mutual labels:  ftp
Ftp Php
FTP Wrapper Class for PHP 5
Stars: ✭ 207 (+276.36%)
Mutual labels:  ftp
Android Upload Service
Easily upload files (Multipart/Binary/FTP out of the box) in the background with progress notification. Support for persistent upload requests, customizations and custom plugins.
Stars: ✭ 2,593 (+4614.55%)
Mutual labels:  ftp
Sharex
ShareX is a free and open source program that lets you capture or record any area of your screen and share it with a single press of a key. It also allows uploading images, text or other types of files to many supported destinations you can choose from.
Stars: ✭ 18,143 (+32887.27%)
Mutual labels:  ftp
Importexportfree
Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel , mapping of any format, Google Sheet, data and price modification, improved speed and a lot more!
Stars: ✭ 160 (+190.91%)
Mutual labels:  ftp
flysystem-curlftp
Flysystem Adapter for the FTP with cURL implementation
Stars: ✭ 36 (-34.55%)
Mutual labels:  ftp
Typo3 Docker Boilerplate
🍲 TYPO3 Docker Boilerplate project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 240 (+336.36%)
Mutual labels:  ftp
RB-libcURL
A Realbasic and Xojo binding to libcurl
Stars: ✭ 19 (-65.45%)
Mutual labels:  ftp
Brutedum
BruteDum - Brute Force attacks SSH, FTP, Telnet, PostgreSQL, RDP, VNC with Hydra, Medusa and Ncrack
Stars: ✭ 212 (+285.45%)
Mutual labels:  ftp
Sftpgo
Fully featured and highly configurable SFTP server with optional HTTP, FTP/S and WebDAV support - S3, Google Cloud Storage, Azure Blob
Stars: ✭ 3,534 (+6325.45%)
Mutual labels:  ftp
php-ftp-client
📦 Provides helper classes and methods to manage FTP files in an OOP way.
Stars: ✭ 81 (+47.27%)
Mutual labels:  ftp
Cakephp File Storage
Abstract file storage and upload plugin for CakePHP. Write to local disk, FTP, S3, Dropbox and more through a single interface. It's not just yet another uploader but a complete storage solution.
Stars: ✭ 202 (+267.27%)
Mutual labels:  ftp
EOSFTPServer
A project to create a complete, standard compliant, multi-user, Objective-C (Mac OS X / iOS) FTP server.
Stars: ✭ 35 (-36.36%)
Mutual labels:  ftp
Bigfile
Bigfile -- a file transfer system that supports http, rpc and ftp protocol https://bigfile.site
Stars: ✭ 186 (+238.18%)
Mutual labels:  ftp
Ftp Srv
📮 Modern FTP Server
Stars: ✭ 253 (+360%)
Mutual labels:  ftp
mpv-scripts
A collection of scripts for mpv player
Stars: ✭ 138 (+150.91%)
Mutual labels:  ftp
simple-ftp-deploy
This package for Sublime Text 3 give you possibility to auto upload file to FTP server when you save local file.
Stars: ✭ 16 (-70.91%)
Mutual labels:  ftp
gftp
gFTP is a free multithreaded file transfer client for *NIX based machines. 56 language translations available.
Stars: ✭ 81 (+47.27%)
Mutual labels:  ftp

Random FTP grabber

Situation: You have various file servers with interesting stuff, too much which you can possibly download, and most of the stuff you never heard about so you cannot tell how much it is of interest, but you still want to download a good set of files.

(A common such situation is if you are on a Hacker Conference like the Chaos Communication Congress/Camp.)

A totally random sampling might already be a good enough representation, but we might be able to improve slightly.

A bit tricky is if there are multiple-parts which belong together - they should be grabbed together.

Usage

Go into the directory where you want to download to.

echo "ftp://bla/blub1" >> sources.txt
echo "ftp://blub/bla2" >> sources.txt
mkdir downloads
RandomFtpGrabber/main.py

It will create some *.db files, e.g. index.db, where it saves its current state, so when you kill it and restart it, it should resume everything, all running downloads and the lazy indexing.

Details

  • Python 3.
  • Downloads via wget.
  • Provide a list of source URLs in the file ./sources.txt.
  • Lazy random sampled indexing of the files. It doesn't build a full index in the beginning, it rather randomly browses through the given sources and randomly selects files for download. See RandomFileQueue for details on the random walking algorithm. If you run it long enough, it still will end up with a full file index, though.
  • FTP indexing via Python ftplib. HTTP via urllib3 and BeautifulSoup.
  • Resumes later on temporary problems (connection timeout, FTP error 4xx), skips dirs/files with unrecoverable problems (file not found anymore or so, FTP error 5xx).
  • Multiple worker threads and a task system with a work queue. See TaskSystem for details on the implementation.
  • Serializes current state (as readable Python expressions) and will recover it on restart, thus it will resume all current actions such as downloads. See Persistence for details on the implementation.

Plan

For found files, it should run some detection whether it should be downloaded (or how to prioritize certain files more than others).

Via the Python module guessit, we can extract useful information just from the filename - works well for movies, episodes or music.

We can then use IMDb to get some more information for movies. The Python module IMDbPY might be useful for this case (although it doesn't support Python 3 yet - see here). Then, also this is relevant.

Some movie recommendation engine can then be useful.

There also could be some movie blacklist. I don't want to download movies which I already have seen.

There could be other filters.

Maybe better scraping and web crawling via Scrapy.

Contribute

Do you want to hack on it? You are very welcome!

About the plans, just contact me so we can do some brainstorming.

Want to support some new protocol? Modify FileSysIntf for the indexing and Downloader for the download logic, although this might already work because it just uses wget for everything.

Author

Albert Zeyer, [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].