All Projects → CommonCrawlDocumentDownload → Similar Projects or Alternatives

35 Open source projects that are alternatives of or similar to CommonCrawlDocumentDownload

khudro
Khudro is a very light weight web-server built with C.
Stars: ✭ 19 (-55.81%)
Mutual labels:  mime-types
mimer
A simple Mime type getter
Stars: ✭ 15 (-65.12%)
Mutual labels:  mime-types
ungoliant
🕷️ The pipeline for the OSCAR corpus
Stars: ✭ 69 (+60.47%)
Mutual labels:  commoncrawl
chatnoir-resiliparse
A robust web archive analytics toolkit
Stars: ✭ 26 (-39.53%)
Mutual labels:  warc
php-mimetyper
PHP mime type and extension mapping library: built with jshttp/mime-db, compatible with Symfony and Laravel
Stars: ✭ 21 (-51.16%)
Mutual labels:  mime-types
KeywordAnalysis
Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
Stars: ✭ 49 (+13.95%)
Mutual labels:  commoncrawl
mimesniff
MIME Sniffing Standard
Stars: ✭ 89 (+106.98%)
Mutual labels:  mime-types
Yagmail
Send email in Python conveniently for gmail using yagmail
Stars: ✭ 2,169 (+4944.19%)
Mutual labels:  mime-types
Fileio.jl
Main Package for IO, loading all different kind of files
Stars: ✭ 133 (+209.3%)
Mutual labels:  mime-types
Apaxy
a simple, customisable theme for your apache directory listing
Stars: ✭ 1,672 (+3788.37%)
Mutual labels:  mime-types
Swime
🗂 Swift MIME type checking based on magic bytes
Stars: ✭ 119 (+176.74%)
Mutual labels:  mime-types
Mime
The Hoa\Mime library.
Stars: ✭ 100 (+132.56%)
Mutual labels:  mime-types
Filetype
Fast, dependency-free, small Go package to infer the binary file type based on the magic numbers signature
Stars: ✭ 1,278 (+2872.09%)
Mutual labels:  mime-types
Mog
A different take on the UNIX tool cat
Stars: ✭ 62 (+44.19%)
Mutual labels:  mime-types
Mime
Map filenames to MIME types
Stars: ✭ 21 (-51.16%)
Mutual labels:  mime-types
Sixarm ruby magic number type
SixArm.com » Ruby » MagicNumberType infers a data type from the data's leading bytes
Stars: ✭ 13 (-69.77%)
Mutual labels:  mime-types
Mime Types
The ultimate javascript content-type utility.
Stars: ✭ 865 (+1911.63%)
Mutual labels:  mime-types
Mime
Shared MIME-info database in D programming language
Stars: ✭ 7 (-83.72%)
Mutual labels:  mime-types
Mime Db
Media Type Database
Stars: ✭ 612 (+1323.26%)
Mutual labels:  mime-types
Mimetype
A fast golang library for MIME type and file extension detection, based on magic numbers
Stars: ✭ 452 (+951.16%)
Mutual labels:  mime-types
Ruby Mime Types
Ruby MIME type registry library
Stars: ✭ 288 (+569.77%)
Mutual labels:  mime-types
Filepicker
🔥🔥🔥Android文件、图片选择器,可按文件夹查找,文件类型查找,支持自定义相机
Stars: ✭ 265 (+516.28%)
Mutual labels:  mime-types
ruby-magic
Simple interface to libmagic for Ruby Programming Language
Stars: ✭ 23 (-46.51%)
Mutual labels:  mime-types
Mime
.NET wrapper for libmagic
Stars: ✭ 51 (+18.6%)
Mutual labels:  mime-types
mimetypes
Erlang MIME types library
Stars: ✭ 77 (+79.07%)
Mutual labels:  mime-types
transmat
Share data beyond the browser boundaries. Enable users to transfer data to external apps, and open your webapp to receive external data.
Stars: ✭ 453 (+953.49%)
Mutual labels:  mime-types
MimeTypesMap
Simple dictionary provides a few methods to lookup mime type/extension, generated From Apache's mime.types.
Stars: ✭ 25 (-41.86%)
Mutual labels:  mime-types
Heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Stars: ✭ 2,104 (+4793.02%)
Mutual labels:  warc
Archivebox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Stars: ✭ 12,383 (+28697.67%)
Mutual labels:  warc
warc
⚙️ A Rust library for reading and writing WARC files
Stars: ✭ 26 (-39.53%)
Mutual labels:  warc
node-warc
Parse And Create Web ARChive (WARC) files with node.js
Stars: ✭ 69 (+60.47%)
Mutual labels:  warc
warc
📇 Tools to Work with the Web Archive Ecosystem in R
Stars: ✭ 21 (-51.16%)
Mutual labels:  warc
mixnode-warcreader-php
Read Web ARChive (WARC) files in PHP.
Stars: ✭ 20 (-53.49%)
Mutual labels:  warc
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+20.93%)
Mutual labels:  warc
wail
🐋 One-Click User Instigated Preservation
Stars: ✭ 107 (+148.84%)
Mutual labels:  warc
1-35 of 35 similar projects