All Projects → pilosa → getting-started

pilosa / getting-started

Licence: BSD-3-Clause license
Code and Data for Getting Started documentation

Programming Languages

python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
go
31211 projects - #10 most used programming language
shell
77523 projects
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to getting-started

java-pilosa
Java client library for Pilosa
Stars: ✭ 18 (-10%)
Mutual labels:  pilosa
stargazer
Python implementation of the R stargazer multiple regression model creation tool
Stars: ✭ 102 (+410%)
Mutual labels:  stargazer
Track-Stargazers
Have fun tracking your project's stargazers
Stars: ✭ 38 (+90%)
Mutual labels:  stargazer
Pilosa
Pilosa is an open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
Stars: ✭ 2,224 (+11020%)
Mutual labels:  pilosa

Getting Started

This repository contains the dataset and sample code for the Getting Started section of Pilosa documentation.

The Dataset

The sample dataset contains stargazer and language data for Github projects which were retrieved for the search keyword "Go". See the Generating the Dataset section below to create other datasets.

  • languages.txt: Language name to languageID mapping. The line number corresponds to the languageID.
  • language.csv: languageID, projectID
  • stargazer.csv: stargazerID, projectID, timestamp(starred)

Usage

Docker

Run the Pilosa Docker image with Getting Started data using:

docker run -it --rm -p 10101:10101 pilosa/getting-started:latest

Continue with Getting Started: Make Some Queries.

Without Docker

  1. Pilosa server should be running: Starting Pilosa
  2. The appropriate schema should be initialized: Create the Schema
  3. Finally, the data can be imported: Import Some Data

Continue with Getting Started: Make Some Queries.

Sample Projects

Generating the Dataset

Using a Github token is strongly recommended for avoiding throttling. If you don't already have a token for the GitHub API, see Creating a personal access token for the command line.

A recent version of Python is required. We test the script with 2.7 and 3.5.

Below are the steps to run commands:

  1. Create a virtual env:
    • Using Python 2.7: virtualenv getting-started
    • Using Python 3.5: python3 -m venv getting-started
  2. Activate the virtual env:
    • On Linux, MacOS, other UNIX: source getting-started/bin/activate
    • On Windows: getting-started\Scripts\activate
  3. Install requirements: pip install -r requirements.txt
  4. If you have a Github token, save it as token in the root directory of the project.

To generate csv files:

The fetch.py script searches Github for a given keyword and creates the dataset explained in The Dataset section.

Run the script: python fetch.py KEYWORD. KEYWORD is the search term to use for searching repository names.

Creating the Docker Image

make docker VERSION=some-version
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].