Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dufferzafar → Geeksforgeeks.pdf

dufferzafar / Geeksforgeeks.pdf

Topic wise PDFs of Geeks for Geeks articles. (Last updated in October 2018)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pdf download scraping

Projects that are alternatives of or similar to Geeksforgeeks.pdf

Tabula

Tabula is a tool for liberating data tables trapped inside PDF files

Stars: ✭ 5,420 (+1008.38%)

Mutual labels: scraping, pdf

The Economist Ebooks

经济学人(含音频)、纽约客、自然、新科学人、卫报、科学美国人、连线、大西洋月刊、新闻周刊、国家地理等英语杂志免费下载、订阅(kindle推送),支持epub、mobi、pdf格式, 每周更新. The Economist 、The New Yorker 、Nature、The Atlantic 、New Scientist、The Guardian、Scientific American、Wired、Newsweek magazines, free download and subscription for kindle, mobi、epub、pdf format.

Stars: ✭ 3,471 (+609.82%)

Mutual labels: pdf, download

Idt

Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.

Stars: ✭ 202 (-58.69%)

Mutual labels: scraping, download

Ibm Z Zos

The helpful and handy location for finding and sharing z/OS files, which are not included in the product.

Stars: ✭ 198 (-59.51%)

Mutual labels: pdf, download

Languagepod101 Scraper

Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Stars: ✭ 104 (-78.73%)

Mutual labels: scraping, download

Educative.io Downloader

📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.

Stars: ✭ 139 (-71.57%)

Mutual labels: scraping, pdf

Invoices

Generate PDF invoices for your customers in laravel

Stars: ✭ 298 (-39.06%)

Mutual labels: pdf, download

Markdown Resume Js

Turn a simple markdown document into a resume in HTML and PDF

Stars: ✭ 449 (-8.18%)

Mutual labels: pdf

Pdf2word

60行代码实现多线程PDF转Word

Stars: ✭ 467 (-4.5%)

Mutual labels: pdf

React Pdf Highlighter

Set of React components for PDF annotation

Stars: ✭ 448 (-8.38%)

Mutual labels: pdf

Tppdf

TPPDF is a simple-to-use PDF builder for iOS

Stars: ✭ 444 (-9.2%)

Mutual labels: pdf

Rust Skia

Safe Skia Bindings for Rust

Stars: ✭ 450 (-7.98%)

Mutual labels: pdf

Pdf2htmlex

Convert PDF to HTML without losing text or format.

Stars: ✭ 472 (-3.48%)

Mutual labels: pdf

Weasyprint

The awesome document factory

Stars: ✭ 4,689 (+858.9%)

Mutual labels: pdf

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+889.16%)

Mutual labels: scraping

Pdftotext

Simple PDF text extraction

Stars: ✭ 445 (-9%)

Mutual labels: pdf

Resume

👾 My resume / 我的简历

Stars: ✭ 482 (-1.43%)

Mutual labels: pdf

Buttonprogressbar Ios

A small and flexible (well documented) UIButton subclass with animated loading progress, and completion animation.

Stars: ✭ 479 (-2.04%)

Mutual labels: download

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (-5.11%)

Mutual labels: scraping

Pdf To Text

Extract text from a pdf

Stars: ✭ 462 (-5.52%)

Mutual labels: pdf

View All Similar Projects ➔

Geeks for Geeks PDFs

Download the PDFs from the releases page.

I started in 2015 from @gnijuohz's repo, but now (in 2018) I've re-written pretty much every part of the process.

Dependencies

docopt
- Basic CLI in scripts
requests & requests_cache
- To download pages and cache the result locally
lxml
- Cleaning of the downloaded pages
pandoc & xelatex
- Convert the cleaned pages to PDF

Running the code

First, find out a "topic url" for what you want to download. Eg:
- https://www.geeksforgeeks.org/tag/samsung/
- https://www.geeksforgeeks.org/category/dynamic-programming/
Create a JSON containing links of all posts on that topic
- python3.6 list_links.py https://www.geeksforgeeks.org/tag/samsung/
- This JSON can now be edited by hand, to remove some links, re-order them etc.
Now fetch the actual posts
- python3.6 download_html.py JSON/Samsung.json
Finally, convert the HTML to a PDF using Pandoc
- python3.6 html_to_pdf.py HTML/Samsung.html

Things will work only if you're really lucky. This project has taught me how fragile my HTML to PDF pipeline really is. There's just too many things that can go wrong.

What could go wrong

The PDF engine that pandoc calls may err!
- In which case, you should convert the html to tex
- Then run pandoc on the tex file in verbose mode
- and manually fix the tex file

Topic URLs

List of Topic URLs that have I've fetched. You can download these from the releases page.

Algorithms

Data Strucutres

Companies

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 489

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗