All Projects → sw-yx → gh-action-data-scraping

sw-yx / gh-action-data-scraping

Licence: MIT license
this shows how to use github actions to do periodic data scraping

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to gh-action-data-scraping

gymbox-bot
Simplify the booking of a gymbox class.
Stars: ✭ 21 (-85.71%)
Mutual labels:  cron
SwiftCron
SwiftCron is meant to make scheduling and repeating functions easy in Swift in macOS and Linux
Stars: ✭ 51 (-65.31%)
Mutual labels:  cron
magento2-module-cron-schedule
A Magento2 visual cronjob overview for magento2 backend
Stars: ✭ 35 (-76.19%)
Mutual labels:  cron
GnusSolution
A complete working solution of gnus+offlineimap+dovecot+msmtp+cron
Stars: ✭ 18 (-87.76%)
Mutual labels:  cron
Automation-using-Shell-Scripts
Development Automation using Shell Scripting.
Stars: ✭ 41 (-72.11%)
Mutual labels:  cron
croncal
Utility to convert a crontab file to a list of actual events within a date range.
Stars: ✭ 37 (-74.83%)
Mutual labels:  cron
jobs.config
Scheduling recurring jobs.config builder/validator for Velo by Wix
Stars: ✭ 18 (-87.76%)
Mutual labels:  cron
LexikCronFileGeneratorBundle
This symfony bundle provides service for generate cron file
Stars: ✭ 20 (-86.39%)
Mutual labels:  cron
crontab
cron expression parser and executor for dotnet core.
Stars: ✭ 13 (-91.16%)
Mutual labels:  cron
cronex
A cron like system built in Elixir, that you can mount in your supervision tree
Stars: ✭ 43 (-70.75%)
Mutual labels:  cron
dron
What if cron and systemd had a baby?
Stars: ✭ 30 (-79.59%)
Mutual labels:  cron
laravel-elasticbeanstalk-cron
Ensure one instance within an EB environment is running Laravel's Scheduler
Stars: ✭ 55 (-62.59%)
Mutual labels:  cron
jobor
支持秒级分布式定时任务系统, A high performance distributed task scheduling system, Support multi protocol scheduling tasks
Stars: ✭ 52 (-64.63%)
Mutual labels:  cron
sidecloq
Recurring / Periodic / Scheduled / Cron job extension for Sidekiq
Stars: ✭ 81 (-44.9%)
Mutual labels:  cron
schedule-rs
An in-process scheduler for periodic jobs. Schedule lets you run Rust functions on a cron-like schedule.
Stars: ✭ 93 (-36.73%)
Mutual labels:  cron
cron-time
Javascript Cron Time Expressions
Stars: ✭ 58 (-60.54%)
Mutual labels:  cron
usps-collection-boxes
US Postal Service collection box locations.
Stars: ✭ 21 (-85.71%)
Mutual labels:  git-scraping
cli
Aplus Framework CLI Library
Stars: ✭ 104 (-29.25%)
Mutual labels:  cron
sf-tree-history
Tracking the history of trees in San Francisco
Stars: ✭ 23 (-84.35%)
Mutual labels:  git-scraping
ckron
🐋 A cron-like job scheduler for docker
Stars: ✭ 37 (-74.83%)
Mutual labels:  cron

gh-action-data-scraping

this repo shows how to use github actions to do automated data scraping, with storage in git itself! free git storage and scheduled updates!!!

2021 Update

You can read more in the Blog Writeup.

As of May 2021, Flat Data scraping is officially supported by GitHub, check them out.

Basic Idea

The script looks like:

# /.github/workflows/daily.yml
on:
  schedule:
    - cron:  '0 8 * * *' # every day at 8am
name: Pull Data and Build
jobs:
  build:
    name: Build
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@master
    - name: Build
      run: npm install
    - name: Scrape
      run: npm run action 
      # env:
      #   WHATEVER_TOKEN: ${{ secrets.YOU_WANT }}
    - uses: mikeal/publish-to-github-action@master
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # GitHub sets this for you

How it should look

For people new to GH actions, this is how my Actions tab of this very repo looks if you need a reference point:

image

Limits

You can do whatever you like with this, including taking screenshots of sites!

The limits I can think of are the limits of GitHub and GitHub Actions:

In addition to these limits, GitHub Actions should not be used for:

  • Content or activity that is illegal or otherwise prohibited by their Terms of Service or Community Guidelines.
  • Cryptomining
  • Serverless computing
  • Activity that compromises GitHub users or GitHub services.
  • Any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used. In other words, be cool, don’t use GitHub Actions in ways you know you shouldn’t.

Be a good citizen, don't abuse it and F this up for the rest of us!

This is heavily based on

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].