All Projects → dufferzafar → Geeksforgeeks.pdf

dufferzafar / Geeksforgeeks.pdf

Topic wise PDFs of Geeks for Geeks articles. (Last updated in October 2018)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Geeksforgeeks.pdf

Tabula
Tabula is a tool for liberating data tables trapped inside PDF files
Stars: ✭ 5,420 (+1008.38%)
Mutual labels:  scraping, pdf
The Economist Ebooks
经济学人(含音频)、纽约客、自然、新科学人、卫报、科学美国人、连线、大西洋月刊、新闻周刊、国家地理等英语杂志免费下载、订阅(kindle推送),支持epub、mobi、pdf格式, 每周更新. The Economist 、The New Yorker 、Nature、The Atlantic 、New Scientist、The Guardian、Scientific American、Wired、Newsweek magazines, free download and subscription for kindle, mobi、epub、pdf format.
Stars: ✭ 3,471 (+609.82%)
Mutual labels:  pdf, download
Idt
Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.
Stars: ✭ 202 (-58.69%)
Mutual labels:  scraping, download
Ibm Z Zos
The helpful and handy location for finding and sharing z/OS files, which are not included in the product.
Stars: ✭ 198 (-59.51%)
Mutual labels:  pdf, download
Languagepod101 Scraper
Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
Stars: ✭ 104 (-78.73%)
Mutual labels:  scraping, download
Educative.io Downloader
📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: ✭ 139 (-71.57%)
Mutual labels:  scraping, pdf
Invoices
Generate PDF invoices for your customers in laravel
Stars: ✭ 298 (-39.06%)
Mutual labels:  pdf, download
Markdown Resume Js
Turn a simple markdown document into a resume in HTML and PDF
Stars: ✭ 449 (-8.18%)
Mutual labels:  pdf
Pdf2word
60行代码实现多线程PDF转Word
Stars: ✭ 467 (-4.5%)
Mutual labels:  pdf
React Pdf Highlighter
Set of React components for PDF annotation
Stars: ✭ 448 (-8.38%)
Mutual labels:  pdf
Tppdf
TPPDF is a simple-to-use PDF builder for iOS
Stars: ✭ 444 (-9.2%)
Mutual labels:  pdf
Rust Skia
Safe Skia Bindings for Rust
Stars: ✭ 450 (-7.98%)
Mutual labels:  pdf
Pdf2htmlex
Convert PDF to HTML without losing text or format.
Stars: ✭ 472 (-3.48%)
Mutual labels:  pdf
Weasyprint
The awesome document factory
Stars: ✭ 4,689 (+858.9%)
Mutual labels:  pdf
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+889.16%)
Mutual labels:  scraping
Pdftotext
Simple PDF text extraction
Stars: ✭ 445 (-9%)
Mutual labels:  pdf
Resume
👾 My resume / 我的简历
Stars: ✭ 482 (-1.43%)
Mutual labels:  pdf
Buttonprogressbar Ios
A small and flexible (well documented) UIButton subclass with animated loading progress, and completion animation.
Stars: ✭ 479 (-2.04%)
Mutual labels:  download
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (-5.11%)
Mutual labels:  scraping
Pdf To Text
Extract text from a pdf
Stars: ✭ 462 (-5.52%)
Mutual labels:  pdf

Geeks for Geeks PDFs

Table of Contents of the Dynamic Programming Book.

Download the PDFs from the releases page.

I started in 2015 from @gnijuohz's repo, but now (in 2018) I've re-written pretty much every part of the process.

Dependencies

  • docopt

    • Basic CLI in scripts
  • requests & requests_cache

    • To download pages and cache the result locally
  • lxml

    • Cleaning of the downloaded pages
  • pandoc & xelatex

    • Convert the cleaned pages to PDF

Running the code

  1. First, find out a "topic url" for what you want to download. Eg:

    • https://www.geeksforgeeks.org/tag/samsung/
    • https://www.geeksforgeeks.org/category/dynamic-programming/
  2. Create a JSON containing links of all posts on that topic

    • python3.6 list_links.py https://www.geeksforgeeks.org/tag/samsung/

    • This JSON can now be edited by hand, to remove some links, re-order them etc.

  3. Now fetch the actual posts

    • python3.6 download_html.py JSON/Samsung.json
  4. Finally, convert the HTML to a PDF using Pandoc

    • python3.6 html_to_pdf.py HTML/Samsung.html

Things will work only if you're really lucky. This project has taught me how fragile my HTML to PDF pipeline really is. There's just too many things that can go wrong.

What could go wrong

  • The PDF engine that pandoc calls may err!
    • In which case, you should convert the html to tex
    • Then run pandoc on the tex file in verbose mode
    • and manually fix the tex file

Topic URLs

List of Topic URLs that have I've fetched. You can download these from the releases page.

Algorithms

Data Strucutres

Companies

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].