All Projects → ritiek → Scribd Downloader

ritiek / Scribd Downloader

Licence: mit
Download documents, books and audiobooks off Scribd

Labels

Projects that are alternatives of or similar to Scribd Downloader

Itext7 Dotnet
iText 7 for .NET is the .NET version of the iText 7 library, formerly known as iTextSharp, which it replaces. iText 7 represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 698 (+320.48%)
Mutual labels:  documents
Kotlin Reference Chinese
Kotlin 官方文档(参考部分)中文版
Stars: ✭ 85 (-48.8%)
Mutual labels:  documents
Lexpredict Contraxsuite
LexPredict ContraxSuite
Stars: ✭ 140 (-15.66%)
Mutual labels:  documents
Itext7
iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 913 (+450%)
Mutual labels:  documents
Simple
字条网示例文档
Stars: ✭ 71 (-57.23%)
Mutual labels:  documents
Stevedore
search document dumps: ingest and explore in one extensible framework
Stars: ✭ 118 (-28.92%)
Mutual labels:  documents
Genji
Document-oriented, embedded SQL database
Stars: ✭ 636 (+283.13%)
Mutual labels:  documents
Templates
A set of standard document templates.
Stars: ✭ 1,953 (+1076.51%)
Mutual labels:  documents
Recursive Cnns
Implementation of my paper "Real-time Document Localization in Natural Images by Recursive Application of a CNN."
Stars: ✭ 80 (-51.81%)
Mutual labels:  documents
Zephyr Doc
《Zephyr OS 文档 - 中文版》
Stars: ✭ 127 (-23.49%)
Mutual labels:  documents
Paperless
Scan, index, and archive all of your paper documents
Stars: ✭ 7,662 (+4515.66%)
Mutual labels:  documents
Word2html
a quick and dirty script to convert a Word (docx) document to html.
Stars: ✭ 44 (-73.49%)
Mutual labels:  documents
Uwazi
Uwazi is a web-based, open-source solution for building and sharing document collections
Stars: ✭ 121 (-27.11%)
Mutual labels:  documents
Peergos
A p2p, secure file storage, social network and application protocol
Stars: ✭ 895 (+439.16%)
Mutual labels:  documents
Svglib
Read SVG files and convert them to other formats.
Stars: ✭ 139 (-16.27%)
Mutual labels:  documents
Org Noter
Emacs document annotator, using Org-mode
Stars: ✭ 671 (+304.22%)
Mutual labels:  documents
Pyzh
📚 一起写Python文章,一起看Python文章 - 利用readthedocs的Python技术文章的收集和翻译。
Stars: ✭ 1,387 (+735.54%)
Mutual labels:  documents
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (-0.6%)
Mutual labels:  documents
Mm Wiki
MM-Wiki 一个轻量级的企业知识分享与团队协同软件,可用于快速构建企业 Wiki 和团队知识分享平台。部署方便,使用简单,帮助团队构建一个信息共享、文档管理的协作环境。
Stars: ✭ 2,364 (+1324.1%)
Mutual labels:  documents
Etherpad Lite
Etherpad: A modern really-real-time collaborative document editor.
Stars: ✭ 11,937 (+7090.96%)
Mutual labels:  documents

Scribd-Downloader

|PyPi Version| |Build Status| |Coverage Status|

(I also found an online service https://dlscrib.com/ created by Erik Fong_. It doesn't use this script as some people seem to think!).

Current features:

+------------+-------------------------------------+-------------------------------------------+ | Type | Downloadable without Scribd premium | Requires Scribd premium for full download | +============+=====================================+===========================================+ | Documents | Yes | No | +------------+-------------------------------------+-------------------------------------------+ | Books | Yes | Yes | +------------+-------------------------------------+-------------------------------------------+ | Audiobooks | Yes | Yes | +------------+-------------------------------------+-------------------------------------------+

Some information about Scribd documents:

There are two types of documents on Scribd:

  • Documents made up using a collection of images and
  • Actual documents where the text can be selected, copied etc.

This script takes a different approach to both of them:

  • Documents consisting of a collection of images is straightforward and this script will simply download the induvidual images which can be combined to .pdf by passing --pdf option to the tool. Simple.

  • Actual documents where the text can be selected are hard to tackle. If we feed such a document to this tool, only the text present in document will be downloaded. Scribd seems to use javascript to somehow combine text and images. So far, I haven't been able to combine them with Python in a way they look like the original document.


Installation

Make sure you're using Python 3 (Python 2 is not supported by a few dependencies). Then run these commands:

::

$ pip install scribd-downloader

or install the development version with:

::

$ python setup.py install

Usage

::

usage: scribdl [-h] [-i] [-p] URL

Download documents and books from scribd.com

positional arguments:
  URL           scribd url to download

optional arguments:
  -h, --help    show this help message and exit
  -i, --images  download url made up of images
  -p, --pdf     convert to pdf (*Nix: imagemagick)

Examples

Scribd Documents

Downloading text from document containing selectable text: :: $ scribdl https://www.scribd.com/document/55949937/33-Strategies-of-War

(Text will be saved side by side in a .md file in the current working directory)

Download document containing images; use the --images option (the tool cannot figure out this on its own): :: $ scribdl -i https://scribd.com/doc/17142797/Case-in-Point

(Images will be saved in the current working directory)

Scribd Books

The below command will generate an .md file of the book in the current working directory: :: $ scribdl https://www.scribd.com/read/189087235/Confessions-of-a-Casting-Director-Help-Actors-Land-Any-Role-with-Secrets-from-Inside-the-Audition-Room

Pass --pdf option to convert the generated output to a PDF.

This will only dowload the book content available without owning a premium account on Scribd. See the below section for downloading full books if you own a premium Scribd account.

Scribd Audiobooks

This will download .mp3 of the audiobook: :: $ scribdl https://www.scribd.com/audiobook/237606860/100-Ways-to-Motivate-Yourself-Change-Your-Life-Forever

This will only download the preview version of the audiobook. See the below section for downloading complete audiobooks if you own a premium Scribd account.


Downloading complete textual books and audiobooks

If you have a premium Scribd account, you can also download the full version of textual books and audiobooks by intercepting the network requests your browser makes. However, this also requires some experience on your side.

When logged into your premium account on scribd on the web browser, setup a network proxy like Mitmproxy_ and install the SSL certificate so you can monitor HTTPS traffic passing through the browser.

Now open any textual book URL (example <https://www.scribd.com/read/189087235/Confessions-of-a-Casting-Director-Help-Actors-Land-Any-Role-with-Secrets-from-Inside-the-Audition-Room>) in your browser, your browser will automatically make network requests to a URL that looks something like https://www.scribd.com/read2/.../access_token. You need to inspect this network request, and replace the values for headers and cookies in the code here <https://github.com/ritiek/scribd-downloader/blob/master/scribdl/const.py>.

You should then be able to automatically download full version of both textual books and audiobooks from Scribd using the tool by running the commands as usual.


Disclaimer

Downloading books from Scribd for free maybe prohibited. This tool is meant for educational purposes only. Please support the authors by buying their titles.


License

The MIT License

.. |PyPi Version| image:: https://img.shields.io/pypi/v/scribd-downloader.svg :target: https://pypi.org/project/scribd-downloader

.. |Build Status| image:: https://travis-ci.org/ritiek/scribd-downloader.svg?branch=master :target: https://travis-ci.org/ritiek/scribd-downloader

.. |Coverage Status| image:: https://codecov.io/gh/ritiek/scribd-downloader/branch/master/graph/badge.svg :target: https://codecov.io/gh/ritiek/scribd-downloader

.. _Mitmproxy: https://github.com/mitmproxy/mitmproxy

.. _Erik Fong: mailto:[email protected] .. _BookURL: https://www.scribd.com/read/189087235/Confessions-of-a-Casting-Director-Help-Actors-Land-Any-Role-with-Secrets-from-Inside-the-Audition-Room .. ConstantValues:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].