All Projects → DarshanDeshpande → Scrapera

DarshanDeshpande / Scrapera

Licence: mit
A universal package of scraper scripts for humans

Programming Languages

python
139335 projects - #7 most used programming language

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

Images Text Audio Videos Miscellaneous
The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

Prerequisites

Prerequisites can be installed separately through the requirements.txt file as below

pip install -r requirements.txt

Installation

Scrapera is built with Python 3 and can be pip installed directly

pip install scrapera

Alternatively, if you wish to install the latest version directly through GitHub then run

pip install git+https://github.com/DarshanDeshpande/Scrapera.git

Usage

To use any sub-module, you just need to import, instantiate and execute

from scrapera.video.vimeo import VimeoScraper
scraper = VimeoScraper()
scraper.scrape('https://vimeo.com/191955190', '540p')

For more examples, please refer to the individual test folders in respective modules

Contributing

Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
For more guidelines, refer to CONTRIBUTING

License

Distributed under the MIT License. See LICENSE for more information.

Sponsors

Logo

Contact

Feel free to reach out for any issues or requests related to Scrapera

Darshan Deshpande (Owner) - Email | LinkedIn

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].