All Projects → Netflix-Skunkworks → Sketchy

Netflix-Skunkworks / Sketchy

Licence: apache-2.0
A task based API for taking screenshots and scraping text from websites.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Sketchy

Psick
Puppet Systems Infrastructure Construction Kit: The control-repo
Stars: ✭ 666 (-32.59%)
Mutual labels:  infrastructure
Obi
OCaml Build Infrastructure
Stars: ✭ 25 (-97.47%)
Mutual labels:  infrastructure
Containerssh.github.io
The ContainerSSH website
Stars: ✭ 29 (-97.06%)
Mutual labels:  infrastructure
Prefect
The easiest way to automate your data
Stars: ✭ 7,956 (+705.26%)
Mutual labels:  infrastructure
Cli
a lightweight, security focused, BDD test framework against terraform.
Stars: ✭ 918 (-7.09%)
Mutual labels:  infrastructure
Moab Versioning
Gem to process digital object version content, metadata, and manifests
Stars: ✭ 9 (-99.09%)
Mutual labels:  infrastructure
Finala
Finala is an open-source resource cloud scanner that analyzes, discloses, presents and notifies about wasteful and unused resources.
Stars: ✭ 605 (-38.77%)
Mutual labels:  infrastructure
Pci Paas Webapp Ase Sqldb Appgateway Keyvault Oms
Azure PCI PaaS Reference Architecture
Stars: ✭ 36 (-96.36%)
Mutual labels:  infrastructure
Decentralized Ml Infra
Blockchain infrastructure for decentralized machine learning (decentralized-ml repo).
Stars: ✭ 23 (-97.67%)
Mutual labels:  infrastructure
Fsharp Data Processing Pipeline
Provides an extensible solution for creating Data Processing Pipelines in F#.
Stars: ✭ 13 (-98.68%)
Mutual labels:  infrastructure
Ansible Best Practises
A project structure that outlines some best practises of how to use ansible
Stars: ✭ 735 (-25.61%)
Mutual labels:  infrastructure
Chef
Chef Infra, a powerful automation platform that transforms infrastructure into code automating how infrastructure is configured, deployed and managed across any environment, at any scale
Stars: ✭ 6,766 (+584.82%)
Mutual labels:  infrastructure
Ffho Salt Public
Salt-Orchestrated OpenSource based Software-Defined-Freifunk-Infrastructre-Network configuration :) Mirrored from https://git.ffho.net/FreifunkHochstift/ffho-salt-public
Stars: ✭ 12 (-98.79%)
Mutual labels:  infrastructure
Awesome Devops
A curated list of resources for Devops
Stars: ✭ 697 (-29.45%)
Mutual labels:  infrastructure
Server
Serve your Rubix ML models in production with scalable stand-alone model inference servers.
Stars: ✭ 30 (-96.96%)
Mutual labels:  infrastructure
Alm
Cloud-Native Application Lifecycle Management (ALM)
Stars: ✭ 637 (-35.53%)
Mutual labels:  infrastructure
Ansible Role Docker
Ansible Role - Docker
Stars: ✭ 845 (-14.47%)
Mutual labels:  infrastructure
Addon Lxdone
Allows OpenNebula to manage Linux Containers via LXD
Stars: ✭ 36 (-96.36%)
Mutual labels:  infrastructure
Cortex
Production infrastructure for machine learning at scale
Stars: ✭ 7,627 (+671.96%)
Mutual labels:  infrastructure
Kubernetes Goat
Kubernetes Goat is "Vulnerable by Design" Kubernetes Cluster. Designed to be an intentionally vulnerable cluster environment to learn and practice Kubernetes security.
Stars: ✭ 868 (-12.15%)
Mutual labels:  infrastructure

----DEPRECATED----

Sketchy

Overview

What is Sketchy?

Sketchy is a task based API for taking screenshots and scraping text from websites.

What is the Output of Sketchy?

Sketchy's capture model contains all of the information associated with screenshotting, scraping, and storing html files from a provided URL. Screenshots (sketches), text scrapes, and html files can either be stored locally or on an S3 bucket. Optionally, token auth can be configured for creating and retrieving captures. Sketchy can also perform callbacks if required.

How Does Sketchy Do It?

Sketchy utilizes PhantomJS with lazy-rendering to ensure Ajax heavy sites are captured correctly. Sketchy also utilizes Celery task management system allowing users to scale Sketchy accordingly and manage time intensive captures.

Release History

Version 1.1.2 - January 27, 2016

This minor release addresses a bug and a new configuration option:

  • A default timeout of 5 seconds was added to check_url task. This should prevent workers from hanging #26.
  • You can now specify a cookie store via an environment variable 'phantomjs_cookies' which will be used by PhantomJS. This env variable simply needs to be a string of key/value cookie pairs.

Version 1.1.1 - June 16, 2015

This minor release addresses a few bugs and some new configuration features:

  • A new configuration option PHANTOMJS_TIMEOUT allows setting how long to wait for a capture to render before terminating the subprocess
  • Celery retry functionality was added when PhantomJS fails to render a screenshot before the PhantomJS timeout occurs
  • An incremental PhantomJS timeout was introduced to improve PhantomJS success at generating very large screenshots. Each time PhantomJS retries to render a screenshot 5 seconds will be added to the previous PHANTOMJS_TIMEOUT configuration option.
  • A number of typos have been fixed and comments have been added.

Version 1.1 - December 4, 2014

A number of improvements and bug fixes have been made:

  • A new model and API endpoint called "Static" was created. This allows users to send Sketchy a static HTML file for text scraping and screenshotting. See the Wiki for usage information.
  • New PhantomJS script called 'static.js' for creating screenshots of static html files.
  • Creation of a new endpoint: api/v1.0/capture/last which shows the last capture that was taken.
  • Creation of a new endpoint: api/v1.0/static/last which shows the last static capture that was taken.
  • API list view is now reverse sorted so most recent capture is listed on the top of the page.
  • For callback requests, capture status is now updated
  • Task retry has been optimized to only retry on ConnectionErrors. This should speedup errors that would never succeed during a retry.
  • A new configuration setting "SSL_HOST_VALIDATION" can be set to scrape/screenshot webpages with SSL errors.
  • A new configuration setting "CAPTURE_ERRORS" can be used to scrape/screenshot webpages that have 4xx or 5xx http status codes.

Documentation

Documentation is maintained in the Github Wiki

Docker

Sketchy is also available as a Docker container.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].