Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tenex → Opensourcecontributors

tenex / Opensourcecontributors

Find all contributions for a user through the GitHub Archive

Programming Languages

python

139335 projects - #7 most used programming language

Labels

open-source data

Projects that are alternatives of or similar to Opensourcecontributors

Digeds cat

This research seeks to examine best practice in the field of digital editions by collating relevant evidence in a detailed catalogue of extant digital projects.

Stars: ✭ 40 (-54.55%)

Mutual labels: data, open-source

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+5489.77%)

Mutual labels: data, open-source

Covid19 scenarios

Models of COVID-19 outbreak trajectories and hospital demand

Stars: ✭ 1,355 (+1439.77%)

Mutual labels: data, open-source

Deveeldb

DeveelDB is a complete SQL database system, primarly developed for .NET/Mono frameworks

Stars: ✭ 80 (-9.09%)

Mutual labels: data, open-source

Tksheet

Python 3.6+ tkinter table widget for displaying tabular data

Stars: ✭ 86 (-2.27%)

Mutual labels: data

Kiftd

sky driver & cloud driver open source server application : kiftd . welcome to the home page: https://kohgylw.gitee.io/ to quick start——kiftd是一款专门面向个人、团队和小型组织的私有网盘系统。轻量、开源、完善。无论是在家庭、学校还是在办公室，您都能立刻开始使用它！了解更多请访问官方网站：

Stars: ✭ 1,259 (+1330.68%)

Mutual labels: open-source

Bhagavadgita

A non-profit initiative to help spread the transcendental wisdom from the Bhagavad Gita to people around the world.

Stars: ✭ 84 (-4.55%)

Mutual labels: open-source

Deeplearning Mindmap

A mindmap summarising Deep Learning concepts.

Stars: ✭ 1,251 (+1321.59%)

Mutual labels: data

Aurdroid

Android AUR [Arch Linux user Repository] packages browser

Stars: ✭ 88 (+0%)

Mutual labels: open-source

Rest Hooks

Delightful data fetching for React.

Stars: ✭ 1,276 (+1350%)

Mutual labels: data

Nodejs Starter

Nodejs Starter - Open-Source Javascript Boilerplate | AppSeed

Stars: ✭ 86 (-2.27%)

Mutual labels: open-source

P32929.github.io

Second iteration of my portfolio - created using ReactJS, Material-UI, Overmind, etc

Stars: ✭ 84 (-4.55%)

Mutual labels: open-source

Openfintech

Opensource FinTech standards & payment provider data

Stars: ✭ 87 (-1.14%)

Mutual labels: data

Rain

Visualize vertical data inside your terminal 💦

Stars: ✭ 84 (-4.55%)

Mutual labels: data

D3vue

A D3 Plugin for VueJS

Stars: ✭ 87 (-1.14%)

Mutual labels: data

Semana Hacktoberfest

🔥 Semana Hacktoberfest na Lukin Co. —— Quer participar da semana Hacktoberfest? Nós preparamos um guia especial para você!

Stars: ✭ 84 (-4.55%)

Mutual labels: open-source

Obofoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry

Stars: ✭ 85 (-3.41%)

Mutual labels: open-source

Surviving With Android

Source code related to the posts in the blog

Stars: ✭ 1,275 (+1348.86%)

Mutual labels: open-source

Ios Demos

Examples of ios applications http://www.novoda.com/blog

Stars: ✭ 85 (-3.41%)

Mutual labels: open-source

Core

Open source Dota 2 data platform

Stars: ✭ 1,266 (+1338.64%)

Mutual labels: data

View All Similar Projects ➔

OpenSourceContributo.rs

Note about name change: This project was formerly known as githubcontributions.io. GitHub requested that the name of the project be changed in order to avoid confusion about who owns and maintains this project.

This is a utility to find a list of all contributions a user has made to any public repository on GitHub from 2011-01-01 through yesterday.

The data from 2015-01-01 - present is found on GitHub Archive. The data from before this uses a different schema and was obtained from Google's BigQuery (see below)

As of 2015-08-28, it tracks a total of

% cd /github-archive/processed
% gzip -l *.json.gz | awk 'END{print $2}' | numfmt --to=iec-i --suffix=B --format="%3f"
93GiB
% zcat *.json.gz | wc -l
253027947

events.

db.contributions.stats():

{
  "ns" : "contributions.contributions",
  "count" : 284048099,
  "size" : 113714359272,
  "avgObjSize" : 400,
  "storageSize" : 47820357632,
  "capped" : false,
  "nindexes" : 4,
  "totalIndexSize" : 8810385408,
  "indexSizes" : {
    "_id_" : 2804744192,
    "_user_lower_1" : 2275647488,
    "_event_id_1" : 1029251072,
    "created_at_1" : 2700742656
  },
  "ok" : 1
}

(WiredTiger stats omitted)

Processing data archives

Processing the data archives involves 3 steps:

Download the raw events files from GitHub Archive into the events directory
Transform the events files by filtering non-contribution events (e.g., starring a repository) and adding necessary indexable keys (e.g., lowercased username)
Load the transformed data into MongoDB

The archive-processor tool in the util directory handles all of this.

The transformed data from step 2 is compressed and saved just in case we need to re-load the entire database (these files are much smaller than the raw data).

All of this can be done automatically by setting the correct environment variables, then running archive-processor process, or it can be invoked differently to separate the steps or change the working directories. Run archive-processor --help for details.

Environment Variable	Meaning
GHC_EVENTS_PATH	Contains data from 2015-01-01 to present (.json.gz)
GHC_TIMELINE_PATH	Contains data before 2015-01-01 (.csv.gz)
GHC_TRANSFORMED_PATH	Contains output of "transform" operation (.json.gz)
GHC_LOADED_PATH	Links to files in GHC_TRANSFORMED_PATH when loaded to DB
GHC_LOG_PATH	Each invocation of `archive-processor` logs to here

BigQuery Data Sets

For the data from 2011-2014 (actually, 2008-08-25 01:07:06 to 2014-12-31 23:59:59), the GitHub Archive project recorded data from the (now deprecated) Timeline API. This is in a different format and has many more quirks than the new GitHub Events API. To obtain this data, the following BigTable query was used (which took only 47.5s to run):

SELECT
  -- common fields
  created_at, actor, repository_owner, repository_name, repository_organization, type, url,
  -- specific to type
  payload_page_html_url,     -- GollumEvent
  payload_page_summary,      -- GollumEvent
  payload_page_page_name,    -- GollumEvent
  payload_page_action,       -- GollumEvent
  payload_page_title,        -- GollumEvent
  payload_page_sha,          -- GollumEvent
  payload_number,            -- IssuesEvent
  payload_action,            -- MemberEvent, IssuesEvent, ReleaseEvent, IssueCommentEvent
  payload_member_login,      -- MemberEvent
  payload_commit_msg,        -- PushEvent
  payload_commit_email,      -- PushEvent
  payload_commit_id,         -- PushEvent
  payload_head,              -- PushEvent
  payload_ref,               -- PushEvent
  payload_comment_commit_id, -- CommitCommentEvent
  payload_comment_path,      -- CommitCommentEvent
  payload_comment_body,      -- CommitCommentEvent
  payload_issue_id,          -- IssueCommentEvent
  payload_comment_id         -- IssueCommentEvent
FROM (
  TABLE_QUERY(githubarchive:year,'true') -- All the years!
)
WHERE type IN (
  "GollumEvent",
  "IssuesEvent",
  "PushEvent",
  "CommitCommentEvent",
  "ReleaseEvent",
  "PublicEvent",
  "MemberEvent",
  "IssueCommentEvent"
)

If you actually want to use this data, there's no need to run that query; just ask me for the CSVs. When gzipped, they are about 19GB.

Erroneous data

There is lots of data in the archives that just doesn't make sense. Where I can, I've worked around it, for example by parsing needed data out of the event's URL. Here are some issues:

BigQuery exports CSV nulls weird?

Example:

SELECT *
FROM [githubarchive:year.2014]
LIMIT 1000

you will note that in the results pane of Google's BigQuery page, there is the string "null" where it really means a real null value. That makes its way into the exported CSV. So you should export the table the real way, or you will have the string "null" for almost every value.

PushEvent with no repository name (Timeline API)

Example:

SELECT *
FROM [githubarchive:year.2014]
WHERE payload_head='8824ed4d86f587a2a556248d9abfac790a1cbd3f'
LIMIT 1

It seems like sometimes, the only way to get the real repository name (owner/project) is to parse it from the URL.

PushEvent with no way of figuring out the repository (Timeline API)

Example:

SELECT *
FROM [githubarchive:year.2011]
WHERE payload_head='32b2177f05be005df3542c14d9a9985be2b553f7'
LIMIT 5

repository_url is https://github.com// and repository_name is / for each of these. They actually push to: https://github.com/Jiyambi/WoW-Pro-Guides but I only know that by reading the commit messages.

Credits

Created by @hut8 and maintained by Tenex Developers (@tenex).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 88

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (16) 🔗