All Projects → kai-dg → hipposcraper

kai-dg / hipposcraper

Licence: other
A Linux terminal tool for parsing and scraping Holberton project pages to automate repetitive tasks.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to hipposcraper

fBrowser
Helpful Selenium functions to make web-scraping easier and faster
Stars: ✭ 16 (-50%)
Mutual labels:  webscraper
Wuxiaworld-2-eBook
This Python script will download chapters from novels availaible on wuxiaworld.com saves then into the .epub format
Stars: ✭ 90 (+181.25%)
Mutual labels:  webscraper
BookingScraper
🌎 🏨 Scrape Booking.com 🏨 🌎
Stars: ✭ 68 (+112.5%)
Mutual labels:  webscraper
CoWin-Vaccine-Notifier
Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.
Stars: ✭ 102 (+218.75%)
Mutual labels:  webscraper
super-anime-downloader
A program which takes an Anime name or URL and downloads the specified range of episodes.
Stars: ✭ 26 (-18.75%)
Mutual labels:  webscraper
News-Manager
🗞news scraping and recommendation system
Stars: ✭ 14 (-56.25%)
Mutual labels:  webscraper
Soup
Web Scraper in Go, similar to BeautifulSoup
Stars: ✭ 1,685 (+5165.63%)
Mutual labels:  webscraper
TrackPurchase
단 몇줄의 코드로 다양한 쇼핑 플랫폼에서 결제 내역을 긁어오자!
Stars: ✭ 19 (-40.62%)
Mutual labels:  webscraper
gcf-packs
Library packs for google cloud functions
Stars: ✭ 48 (+50%)
Mutual labels:  webscraper
anime-scraper
[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3
Stars: ✭ 21 (-34.37%)
Mutual labels:  webscraper
makenews
MakeNews is for journalists and newsrooms. It helps you track news from web and social media in real-time.
Stars: ✭ 46 (+43.75%)
Mutual labels:  webscraper
Ruby Capstone
A simple web scraper built with Ruby and the Nokogiri gem. It crawls a certain website and gets the prices and other data of cryptocurrencies. Rspec was used for testing.
Stars: ✭ 14 (-56.25%)
Mutual labels:  webscraper
scraperx
Library for scraping websites or apis at any scale
Stars: ✭ 49 (+53.13%)
Mutual labels:  webscraper
HostPanic
Find host header injections and perform Host Header attacks with other kind of bugs like web cache poissoning
Stars: ✭ 23 (-28.12%)
Mutual labels:  webscraper
imgur downloader
Python script/class to download an entire Imgur album in one go into a folder of your choice.
Stars: ✭ 35 (+9.38%)
Mutual labels:  webscraper

github version

Hipposcraper - Python Scripts for Automating Holberton Projects

[STATUS] This repo is no longer maintained by Derrick Gee and Brennan D Baraban starting 6/22/2019, please ask around or on the Holberton Slack to find someone who is maintaining a fork of this repo if you are looking for an updated version of this scraper.

The Hipposcraper automates file template creation for Holberton projects. The program takes a link to a Holberton School project, scrapes the webpage, and creates the corresponding directory and files. The Hipposcraper currently supports the following:

System Engineering Low-Level Programming Higher-Level Programming
Bash script templates .c templates .py and .c templates
Header file Header file
_putchar file
main.c test files main.c/main.py test files
README.md README.md README.md

Getting Started 🔧

IMPORTANT: Make sure your version is up to date (at the top of the readme), running hippoproject or hipporead will display the version.

Follow these instructions to set up the Hipposcraper on your machine.

Prerequisites

The Hipposcraper relies on the Python packages Mechanize and BeautifulSoup4. Installation of these packages requires pip. If you are on a Debian-based Linux distribution:

sudo apt-get install pip

Once pip has been installed, install Mechanize and BeautifulSoup4 as follows:

pip install mechanize
pip install beautifulsoup4

Note that you may need to run the --user option when installing these packages.

Setup 🔑

Setting User Information

After cloning a local copy of the repository, enter your Holberton intranet username and password as well as your GitHub name, username, and profile link in the auth_data.json file.

  • Using setup.sh: Run ./setup.sh to automatically setup the required information

Setting Aliases

The Hipposcraper defines two separate Python scripts - one (hippoproject.py) that creates projects, and a second (hipporead.py) that creates README.md files. To run both simultaneously, you'll need to define an alias to the script hipposcrape.sh.

First, open the script and enter the full pathname to the Hipposcraper directory where directed. Then, if you work in a Bash shell, define the following in your .bashrc:

alias hipposcrape='./ENTER_FULL_PATHNAME_TO_SCRAPER_DIRECTORY_HERE/hipposcrape.sh'

Alternatievely, you can define separate aliases for each individual script. To define a project scraper alias:

alias hippoproject='./ENTER_FULL_PATHNAME_TO_SCRAPER_DIRECTORY_HERE/hipposcraper.py'

And to define a README.md scraper alias:

alias hipporead='./ENTER_FULL_PATHNAME_TO_SCRAPER_DIRECTORY_HERE/hipporead.py'

NOTE: This program only works with Python 2; ensure that your aliases specify 'python2' (Mechanize is not supported by Python 3).


Usage 💻

After you have setup the proper aliases, you can run the Hipposcraper with the following command:

~$ hipposcrape project_link

Where project_link is the URL link to the Holberton School project to scrape.

Alternatively, to run only the project scraper:

~$ hippoproject project_link

Or only the README.md scraper:

~$ hipporead project_link

check.sh - Generated for checking formats on all required files

~$ ./check.sh

Repository Contents 📁

  • hipposcraper.sh

    • A Bash script for running the entire Hipposcraper at once.
  • hippoproject.py

    • Python script that scrapes Holberton intranet webpage to create project directories.
  • hipporead.py

    • Python script that scrapes Holberton intranet webpage to create project README.md.
  • auth_data.json

    • Stores user Holberton intranet and GitHub profile information.
  • scrapers

    • Folder of file-creation scrapers.
      • base_parse.py: Python script for parsing project pages.
      • sys_scraper.py: Python methods for creating Bash task files for system engineering projects.
      • low_scraper.py: Python methods for creating _putchar.c, task files, and header file for low-level programming projects.
      • high_scraper.py: Python methods for creating Python task files for higher-level programming projects.
      • test_file_scraper.py: Python methods for creating test files for all project types.
  • setup.sh: Sets up all variables and aliases with this script.

  • autover.sh: Development tool for changing all version strings.


Example of the C scraper

demo0

Example of the README scraper

demo1

Example of check.sh

demo2


Author


Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].