All Projects โ†’ edgi-govdata-archiving โ†’ web-monitoring-processing

edgi-govdata-archiving / web-monitoring-processing

Licence: GPL-3.0 License
Tools for access, "diff"-ing, and analyzing archived web pages

Programming Languages

HTML
75241 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to web-monitoring-processing

getting-started
List of ideas for getting started with TimVideos projects
Stars: โœญ 50 (+177.78%)
Mutual labels:  gsoc-2017
CCAligner
๐Ÿ”ฎ Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
Stars: โœญ 131 (+627.78%)
Mutual labels:  gsoc-2017
GSoC-Data-Analyser
Simple search for organisations participating/participated in the GSoC
Stars: โœญ 29 (+61.11%)
Mutual labels:  gsoc-2017
gsoc17-hhmm
Bayesian Hierarchical Hidden Markov Models applied to financial time series, a research replication project for Google Summer of Code 2017.
Stars: โœญ 102 (+466.67%)
Mutual labels:  gsoc-2017
web-monitoring-db
An HTTP API for tracking and annotating changes to a set of web pages.
Stars: โœญ 15 (-16.67%)
Mutual labels:  web-monitoring
web-monitoring-ui
UI to enable analysts to quickly assess changes to monitored government websites
Stars: โœญ 34 (+88.89%)
Mutual labels:  web-monitoring
website-change-monitor
Monitor a website and get email and Slack notifications when specific changes are detected
Stars: โœญ 104 (+477.78%)
Mutual labels:  web-monitoring

Code of Conduct  Project Status Board

web-monitoring-processing

A component of the EDGI Web Monitoring Project.

Overview of this component's tasks

This component is intended to hold various backend tools serving different tasks:

  1. Query external sources of captured web pages (e.g. Internet Archive, Page Freezer, Sentry), and formulate a request for importing their version and page metadata into web-monitoring-db.
  2. Query web-monitoring-db for new Changes, analyze them in an automated pipeline to assign priority and/or filter out uninteresting ones, and submit this information back to web-monitoring-db.

Development status

Working and Under Active Development:

  • A Python API to the web-monitoring-db Rails app in web_monitoring.db
  • Python functions and a command-line tool for importing snapshots from the Internet Archive into web-monitoring-db.

Legacy projects that may be revisited:

Installation Instructions

  1. Get Python 3.7. This packages makes use of modern Python features and requires Python 3.7+. If you don't have Python 3.7, we recommend using conda to install it. (You don't need admin privileges to install or use it, and it won't interfere with any other installations of Python already on your system.)

  2. Install libxml2 and libxslt. (This package uses lxml, which requires your system to have the libxml2 and libxslt libraries.)

    On MacOS, use Homebrew:

    brew install libxml2
    brew install libxslt

    On Debian Linux:

    apt-get install libxml2-dev libxslt-dev

    On other systems, the packages might have slightly different names.

  3. Install the package.

    pip install -r requirements.txt
    python setup.py develop
  4. Copy the script .env.example to .env and supply any local configuration info you need. (Only some of the package's functionality requires this.) Apply the configuration:

    source .env
  5. See module comments and docstrings for more usage information. Also see the command line tool wm, which is installed with the package. For help, use

    wm --help
  6. To run the tests or build the documentation, first install the development requirements.

    pip install -r requirements-dev.txt
  7. To build the docs:

    cd docs
    make html
  8. To run the tests:

    python run_tests.py

    Any additional arguments are passed through to py.test.

Releases

We try to make sure the code in this repoโ€™s main branch is always in a stable, usable state, but occasionally coordinated functionality may be written across multiple commits. If you are depending on this package from another Python program, you may wish to install from the release branch instead:

$ pip install git+https://github.com/edgi-govdata-archiving/web-monitoring-processing@release

You can also list the git+https: URL above in a pip requirements file.

We usually create merge commits on the release branch that note the PRs included in the release or any other relevant notes (e.g. Release #302 and #313.).

Code of Conduct

This repository falls under EDGI's Code of Conduct.

Contributors

This project wouldnโ€™t exist without a lot of amazing peopleโ€™s help. Thanks to the following for all their contributions! See our contributing guidelines to find out how you can help.

Contributions Name
๐Ÿ’ป โš ๏ธ ๐Ÿš‡ ๐Ÿ“– ๐Ÿ’ฌ ๐Ÿ‘€ Dan Allan
๐Ÿ’ป Vangelis Banos
๐Ÿ’ป ๐Ÿ“– Chaitanya Prakash Bapat
๐Ÿ’ป โš ๏ธ ๐Ÿš‡ ๐Ÿ“– ๐Ÿ’ฌ ๐Ÿ‘€ Rob Brackett
๐Ÿ’ป Stephen Buckley
๐Ÿ’ป ๐Ÿ“– ๐Ÿ“‹ Ray Cha
๐Ÿ’ป โš ๏ธ Janak Raj Chadha
๐Ÿ’ป Autumn Coleman
๐Ÿ’ป Luming Hao
๐Ÿค” Mike Hucka
๐Ÿ’ป Stuart Lynn
๐Ÿ’ป โš ๏ธ Julian Mclain
๐Ÿ’ป Allan Pichardo
๐Ÿ“– ๐Ÿ“‹ Matt Price
๐Ÿ’ป Mike Rotondo
๐Ÿ“– Susan Tan
๐Ÿ’ป โš ๏ธ Fotis Tsalampounis
๐Ÿ“– ๐Ÿ“‹ Dawn Walker

(For a key to the contribution emoji or more info on this format, check out โ€œAll Contributors.โ€)

License & Copyright

Copyright (C) 2017-2021 Environmental Data and Governance Initiative (EDGI)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.0.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the LICENSE file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].