All Projects → italia → developers-italia-backend

italia / developers-italia-backend

Licence: AGPL-3.0 license
publiccode.yml crawler for the Open Source software catalog of Developers Italia

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to developers-italia-backend

standard
The Standard for Public Code - a model for public organizations to build their own open source solutions to enable successful future reuse by similar public organizations in other places.
Stars: ✭ 81 (+350%)
Mutual labels:  publiccode
publiccode.yml
A metadata standard for public software
Stars: ✭ 97 (+438.89%)
Mutual labels:  publiccode

publiccode.yml crawler for the software catalog of Developers Italia

Go Report Card Join the #publiccode channel Get invited

Description

Developers Italia provides a catalog of Free and Open Source software aimed to Public Administrations.

This crawler retrieves the publiccode.yml files from the organizations publishing the software that have registered through the onboarding procedure.

The generated YAML files are then used by developers.italia.it build to generate its static pages.

Setup and deployment processes

The crawler can either run manually on the target machine or it can be deployed from a Docker container with its helm-chart in Kubernetes.

Elasticsearch is used to store the data and has ready to accept connections before the crawler is started.

Manually configure and build the crawler

  1. Save the auth tokens to domains.yml.

  2. Rename config.toml.example to config.toml and set the variables

    NOTE: The application also supports environment variables in substitution to config.toml file. Remember: "environment variables get higher priority than the ones in configuration file"

  3. Build the crawler binary with make

Docker

The repository has a Dockerfile, used to build the production image, and a docker-compose.yml file to setup the development environment.

  1. Copy the .env.example file into .env and edit the environment variables as it suits you. .env.example has detailed descriptions for each variable.

    cp .env.example .env
  2. Save your auth tokens to domains.yml

    cp crawler/domains.yml.example crawler/domains.yml
    editor crawler/domains.yml
  3. Start the environment:

    docker-compose up
    

Run the crawler

Crawl mode: bin/crawler crawl publishers.*.yml

Gets the list of publishers in publishers.*.yml and starts to crawl their repositories.

If it finds a blacklisted repository, it will remove it from Elasticsearch, if it is present.

It also generates:

One mode (single repository url): bin/crawler one [repo url] publishers.*.yml

In this mode one single repository at the time will be evaluated. If the organization is present, its iPA code will be matched with the ones in the publishers' file, otherwise it will be set to null and the slug will have a random code in the end (instead of the iPA code).

Furthermore, the iPA code validation, which is a simple check within the publishers' file (to ensure that code belongs to the selected publisher), will be skipped.

If it finds a blacklisted repository, it will exit immediately.

Other commands

  • bin/crawler updateipa downloads iPA data and writes them into Elasticsearch

  • bin/crawler delete [URL] deletes software from Elasticsearch using its code hosting URL specified in publiccode.url

  • bin/crawler download-publishers downloads organizations and repositories from the onboarding portal repository and saves them to a publishers YAML file.

Crawler blacklists

Blacklists are needed to exclude individual repository that are not in line with our guidelines.

You can set BLACKLIST_FOLDER in config.toml to point to a directory where blacklist files are located. Blacklisting is currently supported by the one and crawl commands.

See also

Authors

Developers Italia is a project by AgID and the Italian Digital Team, which developed the crawler and maintains this repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].