All Projects → ruzickap → action-my-broken-link-checker

ruzickap / action-my-broken-link-checker

Licence: Apache-2.0 license
A GitHub Action for checking broken links

Programming Languages

shell
77523 projects
Dockerfile
14818 projects
HTML
75241 projects

Projects that are alternatives of or similar to action-my-broken-link-checker

ssh2actions
Connect to GitHub Actions VM via SSH for interactive debugging
Stars: ✭ 62 (+93.75%)
Mutual labels:  actions, github-action
action-sync-node-meta
GitHub Action that syncs package.json with the repository metadata.
Stars: ✭ 25 (-21.87%)
Mutual labels:  actions, github-action
overview
Automate your workflows with GitHub actions for MATLAB.
Stars: ✭ 40 (+25%)
Mutual labels:  actions, github-action
nrwl-nx-action
A GitHub Action to wrap Nrwl Nx commands in your workflows.
Stars: ✭ 163 (+409.38%)
Mutual labels:  actions, github-action
hyperlink
Very fast link checker for CI.
Stars: ✭ 85 (+165.63%)
Mutual labels:  link-checker, link-checking
link-snitch
GitHub Action to scan your site for broken links so you can fix them 🔗
Stars: ✭ 50 (+56.25%)
Mutual labels:  actions, broken-links
qodana-action
⚙️ Scan your Java, Kotlin, PHP, Python, JavaScript, TypeScript projects at GitHub with Qodana
Stars: ✭ 112 (+250%)
Mutual labels:  actions, github-action
action-netlify-deploy
🙌 Netlify deployments via GitHub actions
Stars: ✭ 32 (+0%)
Mutual labels:  actions, github-action
codeowners-validator
The GitHub CODEOWNERS file validator
Stars: ✭ 142 (+343.75%)
Mutual labels:  checker, github-action
github-run-tests-action
mabl Github Actions implementation
Stars: ✭ 39 (+21.88%)
Mutual labels:  actions, github-action
standard-action
Github Action to lint with `standard` and friends
Stars: ✭ 15 (-53.12%)
Mutual labels:  actions, github-action
octopus
Recursive and multi-threaded broken link checker
Stars: ✭ 19 (-40.62%)
Mutual labels:  links, checker
clojure-dependency-update-action
A simple GitHub Actions job to create Pull Requests for outdated dependencies in clojure projects
Stars: ✭ 37 (+15.63%)
Mutual labels:  actions, github-action
assign-one-project-github-action
Automatically add an issue or pull request to specific GitHub Project(s) when you create and/or label them.
Stars: ✭ 140 (+337.5%)
Mutual labels:  actions, github-action
setup-jdk
(DEPRECATED) Set up your GitHub Actions workflow with a specific version of AdoptOpenJDK
Stars: ✭ 32 (+0%)
Mutual labels:  actions, github-action
action-deploy-aws-static-site
Batteries-included Github action that deploys a static site to AWS Cloudfront, taking care of DNS, SSL certs and S3 buckets
Stars: ✭ 70 (+118.75%)
Mutual labels:  actions, github-action
aws-assume-role
GitHub action to assume subsequent AWS roles
Stars: ✭ 22 (-31.25%)
Mutual labels:  actions, github-action
changed-files
Github action to retrieve all (added, copied, modified, deleted, renamed, type changed, unmerged, unknown) files and directories.
Stars: ✭ 733 (+2190.63%)
Mutual labels:  actions, github-action
build-godot-action
GitHub action that builds a Godot project for multiple platforms
Stars: ✭ 62 (+93.75%)
Mutual labels:  actions, github-action
material-about
An about screen to use in your Mobile apps.
Stars: ✭ 37 (+15.63%)
Mutual labels:  links, actions

GitHub Actions: My Broken Link Checker

GitHub Marketplace license release GitHub release date GitHub Actions status Docker Hub Build Status

This is a GitHub Action to check broken link in your static files or web pages. The muffet is used for URL checking task.

See the basic GitHub Action example to run periodic checks (weekly) against google.com:

on:
  schedule:
    - cron: '0 0 * * 0'

name: Check markdown links
jobs:
  my-broken-link-checker:
    name: Check broken links
    runs-on: ubuntu-latest
    steps:
      - name: Check
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://www.google.com
          cmd_params: "--one-page-only --max-connections=3 --color=always"  # Check just one page

Check out the real demo:

My Broken Link Checker demo

This deploy action can be combined with Static Site Generators (Hugo, MkDocs, Gatsby, GitBook, mdBook, etc.). The following examples expects to have the web page stored in ./build directory. There is a caddy web server started during the tests which is using the hostname from the URL parameter and serving the web pages (see the details in entrypoint.sh).

- name: Check
  uses: ruzickap/action-my-broken-link-checker@v2
  with:
    url: https://www.example.com/test123
    pages_path: ./build/
    cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --timeout=20'  # muffet parameters

Do you want to skip the docker build step? OK, the script mode is also available:

- name: Check
  env:
    INPUT_URL: https://www.example.com/test123
    INPUT_PAGES_PATH: ./build/
    INPUT_CMD_PARAMS: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --skip-tls-verification'  # --skip-tls-verification is mandatory parameter when using https and "PAGES_PATH"
  run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

Parameters

Environment variables used by ./entrypoint.sh script.

Variable Default Description
INPUT_CMD_PARAMS --buffer-size=8192 --max-connections=10 --color=always --verbose Command-line parameters for URL checker muffet - details here
INPUT_DEBUG false Enable debug mode for the ./entrypoint.sh script (set -x)
INPUT_PAGES_PATH Relative path to the directory with local web pages
INPUT_URL (Mandatory / Required) URL which will be checked

Example of Periodic checks

Pipeline for periodic link checks:

name: periodic-broken-link-checks

on:
  workflow_dispatch:
  push:
    paths:
      - .github/workflows/periodic-broken-link-checks.yml
  schedule:
    - cron: '3 3 * * 3'

jobs:
  broken-link-checker:
    runs-on: ubuntu-latest
    steps:

      - name: Get GH Pages URL
        id: gh_pages_url
        uses: actions/github-script@v5
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            let result = await github.request('GET /repos/:owner/:repo/pages', {
              owner: context.repo.owner,
              repo: context.repo.repo
            });
            console.log(result.data.html_url);
            return result.data.html_url
          result-encoding: string

      - name: Check broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: ${{ steps.gh_pages_url.outputs.result }}
          cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --timeout=20'

Full example

GitHub Action example:

name: Checks

on:
  push:
    branches:
      - main

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Create web page
        run: |
          mkdir -v public
          cat > public/index.html << EOF
          <!DOCTYPE html>
          <html>
            <head>
              My page which will be stored on my-testing-domain.com domain
            </head>
            <body>
              Links:
              <ul>
                <li><a href="https://my-testing-domain.com">https://my-testing-domain.com</a></li>
                <li><a href="https://my-testing-domain.com:443">https://my-testing-domain.com:443</a></li>
              </ul>
            </body>
          </html>
          EOF

      - name: Check links using script
        env:
          INPUT_URL: https://my-testing-domain.com
          INPUT_PAGES_PATH: ./public/
          INPUT_CMD_PARAMS: '--skip-tls-verification --verbose --color=always'
          INPUT_DEBUG: true
        run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

      - name: Check links using container
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://my-testing-domain.com
          pages_path: ./public/
          cmd_params: '--skip-tls-verification --verbose --color=always'
          debug: true

Best practices

Let's try to automate the creating the web pages as much as possible.

The ideal situation require the repository naming convention, where the name of the GitHub repository should match the URL where it will be hosted.

GitHub Pages with custom domain

The mandatory part is the repository name awsug.cz which is the same as the domain:

The web pages will be stored as GitHub Pages on it's own domain.

The GH Action file may looks like:

name: hugo-build

on:
  pull_request:
    types: [opened, synchronize]
  push:

jobs:
  hugo-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Checkout submodules
        shell: bash
        run: |
          auth_header="$(git config --local --get http.https://github.com/.extraheader)"
          git submodule sync --recursive
          git -c "http.extraheader=$auth_header" -c protocol.version=2 submodule update --init --force --recursive --depth=1

      - name: Setup Hugo
        uses: peaceiris/actions-hugo@v2
        with:
          hugo-version: '0.62.0'

      - name: Build
        run: |
          hugo --gc
          cp LICENSE README.md public/
          echo "${{ github.event.repository.name }}" > public/CNAME

      - name: Check broken links
        env:
          INPUT_URL: https://${{ github.event.repository.name }}
          INPUT_PAGES_PATH: public
          INPUT_CMD_PARAMS: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --exclude="(mylabs.dev|linkedin.com)"'
        run: |
          wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

      - name: Check links using container
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://my-testing-domain.com
          pages_path: ./public/
          cmd_params: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --exclude="(mylabs.dev|linkedin.com)"'
          debug: true

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
        env:
          ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          PUBLISH_BRANCH: gh-pages
          PUBLISH_DIR: public
        with:
          forceOrphan: true

The example is using Hugo.

GitHub Pages with github.io domain

The mandatory part is the repository name k8s-harbor which is the directory part at the and of ruzickap.github.io:

In the example the web pages will be using GitHub's domain github.io.

name: vuepress-build-check-deploy

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - .github/workflows/vuepress-build-check-deploy.yml
      - docs/**
      - package.json
      - package-lock.json
  push:
    paths:
      - .github/workflows/vuepress-build-check-deploy.yml
      - docs/**
      - package.json
      - package-lock.json

jobs:
  vuepress-build-check-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install Node.js 12
        uses: actions/setup-node@v1
        with:
          node-version: 12.x

      - name: Install VuePress and build the document
        run: |
          npm install
          npm run build
          cp LICENSE docs/.vuepress/dist
          sed -e "s@(part-@(https://github.com/${GITHUB_REPOSITORY}/tree/main/docs/part-@" -e 's@.\/.vuepress\/public\/@./@' docs/README.md > docs/.vuepress/dist/README.md
          ln -s docs/.vuepress/dist ${{ github.event.repository.name }}

      - name: Check broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}
          pages_path: .
          cmd_params: '--exclude=mylabs.dev --max-connections-per-host=5 --rate-limit=5 --timeout=20 --header="User-Agent:curl/7.54.0" --skip-tls-verification'

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
        env:
          ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          PUBLISH_BRANCH: gh-pages
          PUBLISH_DIR: ./docs/.vuepress/dist
        with:
          forceOrphan: true

In this case I'm using VuePress to create my page.

GitHub Action my-broken-link-checker


Both examples can be used as a generic template, and you do not need to change them for your projects.

Running locally

It's possible to use the checking script locally. It will install caddy and muffet binaries if they are not already installed on your system.

export INPUT_URL="https://google.com"
export INPUT_CMD_PARAMS="--ignore-fragments --one-page-only --max-connections=10 --color=always --verbose"
./entrypoint.sh

Output:

*** INFO: [2019-12-30 14:53:54] Start checking: "https://google.com"
https://www.google.com/
        200     http://www.google.cz/history/optout?hl=cs
        200     http://www.google.cz/intl/cs/services/
        200     https://accounts.google.com/ServiceLogin?hl=cs&passive=true&continue=https://www.google.com/
        200     https://drive.google.com/?tab=wo
        200     https://mail.google.com/mail/?tab=wm
        200     https://maps.google.cz/maps?hl=cs&tab=wl
        200     https://news.google.cz/nwshp?hl=cs&tab=wn
        200     https://play.google.com/?hl=cs&tab=w8
        200     https://www.google.com/advanced_search?hl=cs&authuser=0
        200     https://www.google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png
        200     https://www.google.com/intl/cs/about.html
        200     https://www.google.com/intl/cs/ads/
        200     https://www.google.com/intl/cs/policies/privacy/
        200     https://www.google.com/intl/cs/policies/terms/
        200     https://www.google.com/language_tools?hl=cs&authuser=0
        200     https://www.google.com/preferences?hl=cs
        200     https://www.google.com/setprefdomain?prefdom=CZ&prev=https://www.google.cz/&sig=K_WmKyDZc24PJiXFyTjsUeLLrG-P4%3D
        200     https://www.google.com/textinputassistant/tia.png
        200     https://www.google.cz/imghp?hl=cs&tab=wi
        200     https://www.google.cz/intl/cs/about/products?tab=wh
        200     https://www.youtube.com/?gl=CZ&tab=w1
*** INFO: [2019-12-30 14:53:55] Checks completed...

You can also use the advantage of the container to run the checks locally without touching your system:

export INPUT_URL="https://google.com"
export INPUT_CMD_PARAMS="--ignore-fragments --one-page-only --max-connections=10 --color=always --verbose"
docker run --rm -t -e INPUT_URL -e INPUT_CMD_PARAMS peru/my-broken-link-checker

my-broken-link-checker-demo

Another example when checking the the web page locally stored on your disk. In this case I'm using the web page created in the ./tests/ directory from this git repository:

export INPUT_URL="https://my-testing-domain.com"
export INPUT_PAGES_PATH="${PWD}/tests/"
export INPUT_CMD_PARAMS="--skip-tls-verification --verbose --color=always"
./entrypoint.sh

Output:

*** INFO: Using path "/home/pruzicka/git/action-my-broken-link-checker/tests/" as domain "my-testing-domain.com" with URI "https://my-testing-domain.com"
*** INFO: [2019-12-30 14:54:22] Start checking: "https://my-testing-domain.com"
https://my-testing-domain.com/
        200     https://my-testing-domain.com
        200     https://my-testing-domain.com/run_tests.sh
        200     https://my-testing-domain.com:443
        200     https://my-testing-domain.com:443/run_tests.sh
https://my-testing-domain.com:443/
        200     https://my-testing-domain.com
        200     https://my-testing-domain.com/run_tests.sh
        200     https://my-testing-domain.com:443
        200     https://my-testing-domain.com:443/run_tests.sh
*** INFO: [2019-12-30 14:54:22] Checks completed...

The same example as above, but in this case I'm using the container:

export INPUT_URL="https://my-testing-domain.com"
export INPUT_PAGES_PATH="${PWD}/tests/"
export INPUT_CMD_PARAMS="--skip-tls-verification --verbose"
docker run --rm -t -e INPUT_URL -e INPUT_CMD_PARAMS -e INPUT_PAGES_PATH -v "$INPUT_PAGES_PATH:$INPUT_PAGES_PATH" peru/my-broken-link-checker

Examples

Some other examples of building and checking web pages using Static Site Generators and GitHub Actions can be found here: https://github.com/peaceiris/actions-gh-pages/

The following links contains real examples of My Broken Link Checker:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].