All Projects → gousiosg → Github Mirror

gousiosg / Github Mirror

Licence: bsd-2-clause
Scripts to mirror Github in a cloudy fashion

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Github Mirror

Github Wrapped
Take a look back at all the contributions you as an individual made to the open-source community
Stars: ✭ 304 (-36.4%)
Mutual labels:  github-api
Gitter
Gitter for GitHub - 可能是目前颜值最高的GitHub微信小程序客户端
Stars: ✭ 3,498 (+631.8%)
Mutual labels:  github-api
Gittrends
A iOS and Android app to monitor the views and clones of your GitHub repos
Stars: ✭ 388 (-18.83%)
Mutual labels:  github-api
Octoprofile
A nicer look at GitHub profiles built with Next.js and the GitHub API
Stars: ✭ 310 (-35.15%)
Mutual labels:  github-api
Leethub
Automatically sync your leetcode solutions to your github account - top 5 trending GitHub repository
Stars: ✭ 316 (-33.89%)
Mutual labels:  github-api
Github Stats
Better GitHub statistics images for your profile, no external server required
Stars: ✭ 338 (-29.29%)
Mutual labels:  github-api
Simonw
https://simonwillison.net/2020/Jul/10/self-updating-profile-readme/
Stars: ✭ 297 (-37.87%)
Mutual labels:  github-api
Automerge Action
GitHub action to automatically merge pull requests that are ready
Stars: ✭ 446 (-6.69%)
Mutual labels:  github-api
Octokit.rb
Ruby toolkit for the GitHub API
Stars: ✭ 3,522 (+636.82%)
Mutual labels:  github-api
Profile Summary For Github
Tool for visualizing GitHub profiles
Stars: ✭ 19,489 (+3977.2%)
Mutual labels:  github-api
Astronomer
A tool to detect illegitimate stars from bot accounts on GitHub projects
Stars: ✭ 323 (-32.43%)
Mutual labels:  github-api
Octokit.swift
A Swift API Client for GitHub and GitHub Enterprise
Stars: ✭ 325 (-32.01%)
Mutual labels:  github-api
Ok.sh
A Bourne shell GitHub API client library focused on interfacing with shell scripts
Stars: ✭ 365 (-23.64%)
Mutual labels:  github-api
Gitify
GitHub notifications on your menu bar. Available on macOS, Windows & Linux.
Stars: ✭ 3,543 (+641.21%)
Mutual labels:  github-api
Octokat.js
Github API Client using Promises or callbacks. Intended for the browser or NodeJS.
Stars: ✭ 401 (-16.11%)
Mutual labels:  github-api
Github Rs
Pure Rust bindings to the Github API
Stars: ✭ 298 (-37.66%)
Mutual labels:  github-api
This Repo Has 350 Stars
Yes, it's true 💕 This repository has 350 stars.
Stars: ✭ 350 (-26.78%)
Mutual labels:  github-api
Hub
A command-line tool that makes git easier to use with GitHub.
Stars: ✭ 21,420 (+4381.17%)
Mutual labels:  github-api
Pygithub
Typed interactions with the GitHub API v3
Stars: ✭ 4,825 (+909.41%)
Mutual labels:  github-api
Hukum
An NPM module that displays Github Action progress in the terminal and aims to improve your development experience by printing status in realtime.
Stars: ✭ 375 (-21.55%)
Mutual labels:  github-api

ghtorrent: Mirror and index data from the Github API

A library and a collection of scripts used to retrieve data from the Github API and extract metadata in an SQL database, in a modular and scalable manner. The scripts are distributed as a Gem (ghtorrent), but they can also be run by checking out this repository.

GHTorrent can be used for a variety of purposes, such as:

  • Mirror the Github API event stream and follow links from events to actual data to gradually build a Github index
  • Create a queriable metadata database for a specific repository
  • Construct a data source for extracting process analytics (see for example those) for one or more repositories

Components

GHTorrents components (which can be used individually) are:

  • APIClient: Knows how to query the Github API (both single entities and pages) and respect the API request limit. Can be configured to override the default IP address, in case of multihomed hosts.
  • Retriever: Knows how to retrieve specific Github entities (users, repositories, watchers) by name. Uses an optional persister to avoid retrieving data that have not changed.
  • Persister: A key/value store, which can be backed by a real key/value store, to store Github JSON replies and query them on request. The backing key/value store must support arbitrary queries to the stored JSON objects.
  • GHTorrent: Knows how to extract information from the data retrieved by the retriever in order to update an SQL database (see schema) with metadata.

Component Configuration

The Persister and GHTorrent components have configurable back ends:

  • Persister: Either uses MongoDB > 3.0 (mongo driver) or no persister (noop driver)
  • GHTorrent: GHTorrent is tested mainly with MySQL and SQLite, but can theoretically be used with any SQL database compatible with Sequel. Your milaege may vary.

For distributed mirroring you also need RabbitMQ >= 3.3

Installation

1. Install GHTorrent

GHTorrent is written in Ruby (tested with Ruby > 2.0). To install it as a Gem do:

sudo gem install ghtorrent

2. Install Your Preferred Database

Depending on which SQL database you want to use, install the appropriate dependency gem.

sudo gem install mysql2 # or sqlite3

Configuration

Copy config.yaml.tmpl to a file in your home directory.

All provided scripts accept the -c option, which accepts the location of the configuration file as a parameter.

You can find more information of how you can setup a mirroring cluster of machines to retrieve data in parallel on the Wiki.

Using GHTorrent

To mirror the event stream and capture all data:

  • ght-mirror-events.rb periodically polls Github's event queue (https://api.github.com/events), stores all new events in the configured pestister, and posts them to the github exchange in RabbitMQ.

  • ght-data_retrieval.rb creates queues that route posted events to processor functions. The functions use the appropriate Github API call to retrieve the linked contents, extract metadata (for database storage), and store the retrieved data in the appropriate collection in the persister, to avoid duplicate API calls. Data in the SQL database contain pointers (the ext_ref_id field) to the "raw" data in the persister.

To retrieve data for a repository or user:

  • ght-retrieve-repo retrieves all data for a specific repository
  • ght-retrieve-user retrieves all data for a specific user

To perform maintenance:

  • ght-load loads selected events from the persister to the queue in order for the ght-data-retrieval script to reprocess them

Data

The code in this repository is used to power the data collection process of the GHTorrent.org project. You can find all data collected by in the project in the Downloads page.

There are two sets of data:

  • Raw events: Github's event stream. These are the roots for mirroring operations. The ght-data-retrieval crawler starts from an event and goes deep into the rabbit hole.
  • SQL dumps + Linked data: Data dumps from the SQL database and the corresponding MongoDB entities.

Bugs & Feature Requests

Please tell us about features you'd like or bugs you've discovered on our Issue Tracker.

Patches, bug fixes, etc are welcome. Please fork the repository and create a pull request when done fixing/implementing the new feature.

Citing GHTorrent in your Research

If you find GHTorrent and the accompanying datasets useful in your research, please consider citing the following paper:

Georgios Gousios and Diomidis Spinellis, "GHTorrent: GitHub’s data from a firehose," in MSR '12: Proceedings of the 9th Working Conference on Mining Software Repositories, June 2-–3, 2012. Zurich, Switzerland.

Authors

License

2-clause BSD

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].