All Projects → simplecto → Screenshots

simplecto / Screenshots

Simple Website Screenshots as a Service (Django, Selenium, Docker, Docker-compose)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Screenshots

Csdnbot
CSDN 资源下载器
Stars: ✭ 209 (+65.87%)
Mutual labels:  django, selenium
Screen Recorder
A Ruby gem to video record and take screenshots of your desktop or specific application window. Works on Windows, Linux, and macOS.
Stars: ✭ 135 (+7.14%)
Mutual labels:  screenshots, selenium
Python Spider
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Stars: ✭ 615 (+388.1%)
Mutual labels:  django, selenium
Reporting
Zebrunner Reporting Tool
Stars: ✭ 198 (+57.14%)
Mutual labels:  screenshots, selenium
Crawl
selenium异步爬取网页图片
Stars: ✭ 13 (-89.68%)
Mutual labels:  django, selenium
Django Trench
django-trench provides a set of REST API endpoints to supplement django-rest-framework with multi-factor authentication (MFA, 2FA). It supports both standard built-in authentication methods, as well as JWT (JSON Web Token).
Stars: ✭ 123 (-2.38%)
Mutual labels:  django
Impostor
Django app that enables staff to log in as other users using their own credentials.
Stars: ✭ 124 (-1.59%)
Mutual labels:  django
Django Classified
Django Classified
Stars: ✭ 122 (-3.17%)
Mutual labels:  django
Water Monitoring System
Water Monitoring System is an IOT based Liquid Level Monitoring system that has mechanisms to keep the user alerted in case of liquid overflow or when tank depletes.
Stars: ✭ 122 (-3.17%)
Mutual labels:  django
Django Init
Project template used at Fueled for scaffolding new Django based projects. 💫
Stars: ✭ 126 (+0%)
Mutual labels:  django
Wagtail
A Django content management system focused on flexibility and user experience
Stars: ✭ 11,387 (+8937.3%)
Mutual labels:  django
Django Jsoneditor
Django JSONEditor input widget to provide javascript online JSON Editor
Stars: ✭ 124 (-1.59%)
Mutual labels:  django
Django Project Skeleton
A skeleton aka. template for Django projects
Stars: ✭ 123 (-2.38%)
Mutual labels:  django
Django Fields
Fields pack for django framework.
Stars: ✭ 124 (-1.59%)
Mutual labels:  django
Tera
A template engine for Rust based on Jinja2/Django
Stars: ✭ 1,873 (+1386.51%)
Mutual labels:  django
Elasticstack
📇 Configurable indexing and other extras for Haystack (with ElasticSearch biases)
Stars: ✭ 125 (-0.79%)
Mutual labels:  django
Js Nightwatch Recorder
🌙 ⌚️ NightwatchJs recorder for Chrome
Stars: ✭ 122 (-3.17%)
Mutual labels:  selenium
Searchrestaurant
Apps are built using Google Maps SDK, Geocoding and Foursquare APIs
Stars: ✭ 124 (-1.59%)
Mutual labels:  django
Selenium Ide
Open Source record and playback test automation for the web.
Stars: ✭ 1,815 (+1340.48%)
Mutual labels:  selenium
Django Microservices
UNMAINTAINED
Stars: ✭ 124 (-1.59%)
Mutual labels:  django

Purpose

The purpose of this project is to explore and experiment with what it takes to make a website screen-shotting tool. At first it may seem like an easy task, but it becomes complex once you try.

NOTE: If you just want a tool that "just works" then I suggest you try any of the capable services linked below.

Common problems

  • Javascript heavy pages (almost all these days); many sites use JavaScript to load content after the page has downloaded into the browser. Therefore you need to have a modern javascript engine to parse and execute those extra instructions to get the content as it was intented to be seen by humans.
  • Geography-restricted content; some sites in the US have blocked visitors from Europe because of GDPR. Do you accept this, or is there a way to work around it?
  • Bot and automation detection schemes; some sites use services to protect against automated processes from collecting content. This includes taking screenshots
  • Improperly configured domain names, SSL/TLS encryption certificates, and other network-related issues
  • Nefarious website owners and hacked sites that attempt to exploit the web browser to mine crypto-currencies. This puts an added load on your resources and can significantly slow your render-times.
  • Taking too many screenshots at a time may overload the server and cause timeouts or failure to load pages.
  • Temporary network or website failure; If the problem is on the site's end, then how will we know that and schedule another attempt later?
  • People using the service as a defacto proxy (eg- pranksters downloading porn at their schools or in public places)

Requirements

My development evironment is on MacOS, so HomeBrew and PyCharm are my friends here.

  • python 3.x stable in Virtual Environment (this is the only version I'm working with)
  • Selenium/geckodriver/chrome-driver installed via homebrew brew install geckodriver
  • Docker
  • Postgres installed via Homebrew.

I don't use Docker on my development machine because I have not figured out how to get PyCharm's awesome debugger working well inside docker containers. IF you can, ping me.

Getting started

  1. Check out the repo
  2. Install a local virtual environment python -m venv venv/
  3. Jump into venv/ with source venv/bin/activate
  4. Install requirements pip install -r requirements.txt
  5. Create the postgres database for the project CREATE DATABASE screenshots
  6. copy the env.sample to env in the root source folder
  7. Check / update values in the env folder if needed
  8. Install Selenium geckodriver for your platform brew install geckodriver
  9. Migrate the database cd src && ./manage.py migrate
  10. Create the cache table cd src && ./manage.py createcachetable
  11. Create the superuser cd src && ./manage.py createsuperuser
  12. Start the worker cd src && ./manage.py screenshot_worker_ff
  13. Finally, start the webserver cd src && ./manage.py runserver 0.0.0.0:8000

Open a browser onto http://localhost:8000 and see the screenshot app in all its glory.

System Architecture

system architecture

Web process

Django runs as usual in either development mode or inside gunicorn (for production).

Worker Processes

There is a worker (or a number of workers) that run as parallel, independent processes to the webserver process. They connect to the database and poll for new work on an interval. This pattern obviates the need for Celery, Redis, RabbitMQ, or other complicated moving parts in the system.

The worker processes work like this:

  1. poll database for new screenshots to make
  2. find a screenshot, mark it as pending
  3. launch slenium and take screenshot of resulting page (up to 60 seconds time limit)
  4. save screenshot to database
  5. shutdown selenium browser
  6. sleep
  7. repeat

But where are images stored?

In the database! Now, before you lose it -- I know what many of you will say about storing images in the database. I have linked to the StackOverflow here:

My rationale is this:

  • All content lives in the database, so there is no syncing issues with regards to the data (screenshots) and the metadata (database).
  • Images will be smallish because they are compressed screenshots not more than a 1mb (often far less). But we will need to run many iterations and save as much metadata about the screens to really know.
  • Thumbnails will be stored in cache (also a database table), but get purged after 30 days.
  • Todays compute, network, and storage capacities are so big that 1TB is no longer considered unreasonable. This means that if we build up a screenshot datbase of 1TB, then that is a good problem to have and we can re-architect from there.

Note: This is a hypothesis, and I am willing to change my mind if this does not work out.

Recommended reading on the subject

Alternative Services

Thank-yous

Contributing

Please fork and submit pull requests if you are inspired to do so. Issues are open as well.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].