All Projects → GoogleCloudPlatform → LabelCat

GoogleCloudPlatform / LabelCat

Licence: Apache-2.0 license
Uses Google Prediction API to label GitHub Issues as they are created.

Programming Languages

javascript
184084 projects - #8 most used programming language

LabelCat Build Status Test Coverage License

Note: LabelCat is in development.

Disclaimer: This is not an official Google product.

Organizing the issues in your GitHub repositories can be a different kind of animal, that's why you need LabelCat.

Installation

Clone and setup

  1. Install Node.js >= 8.x
  2. git clone https://github.com/GoogleCloudPlatform/LabelCat
  3. cd LabelCat
  4. npm install
  5. npm link .
  6. cp defaultsettings.json settings.json (settings.json is where you customize the app)
  7. Modify settings.json as necessary.

Configure your project environment

  1. In the GCP Console, go to the Manage Resources page and select or create a new project:

    Go to the Manage Resources Page

  2. Update settings.json to include your GCP Project ID and Compute Region.

  3. Make sure that billing is enabled for your project:

    Learn How to Enable Billing

  4. Enable the AutoML Natural Language APIs.

    Enable the APIs

  5. Install the gcloud command line tool.

  6. Follow the instructions to create a service account and download a key file.

    Create Account and Download Key File

  7. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path to the Service Account key file that you downloaded when you created the Service Account. For example:

    export GOOGLE_APPLICATION_CREDENTIALS=key-file
    
  8. Give your new Service Account the AutoML Editor IAM role with the following commands:

    gcloud auth login
    gcloud config set project YOUR_PROJECT_ID
    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
        --member=serviceAccount:SERVICE_ACCOUNT_NAME \
        --role='roles/automl.editor'
    

    replacing YOUR_PROJECT_ID with your GCP project ID and SERVICE_ACCOUNT_NAMEwith the name of your new Service Account, for example [email protected].

  9. Allow the AutoML Natural Language service accounts to access your Google Cloud project resources:

    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
        --member="serviceAccount:[email protected]" \
        --role="roles/storage.admin"
    

    replacing YOUR_PROJECT_ID with your GCP project ID.

  10. Create a Google Cloud Storage bucket to store the documents that you will use to train your custom model. The bucket name must be in the format: YOUR_PROJECT_ID-lcm. Runy the following command to create a bucket in the us-central1 region:

    gsutil mb -p YOUR_PROJECT_ID -c regional -l `us-central1` gs://YOUR_PROJECT_ID-lcm/
    

    replacing YOUR_PROJECT_ID with your GCP project ID.

Usage

Run labelcat --help for usage information.

    labelcat <command>

Commands:
  labelcat retrieveIssues <repoDataFilePath>                    Retrieves issues from a .txt file of gitHub
  <issuesDataFilePath> <label>                                  repositories. Options: -a
  labelcat createDataset <datasetName>                          Create a new Google AutoML NL dataset with the specified
                                                                name. Options: -m
  labelcat importData <issuesDataPath> <datasetId>              Import the GitHub issues data from Google Cloud Storage
                                                                bucket into the Google AutoML NL dataset by specifying
                                                                the file's path in the bucket and the dataset ID.

Options:
  --version  Show version number                                                                               [boolean]
  --help     Show help                                                                                         [boolean]

Examples:
  labelcat retrieveIssues repoData.txt issuesData.csv 'type:    Retrieves issues with matching labels from list of repos
  bug' -a 'bug' -a 'bugger'                                     in repoData.txt and saves the resulting information to
                                                                issuesData.csv.
  labelcat createDataset Data                                   Creates a new multilabel dataset with the specified
                                                                name.
  labelcat importData gs://myproject/mytraindata.csv            Imports the GitHub issues data into the dataset by
  1248102981                                                    specifying the file of issues data and the dataset ID.

Retrieve Issues

  1. Create a repos.txt file with a single column list of GitHub repositories from which to collect issue data. The format should be :owner/:repository:

    Example:

    GoogleCloudPlatform/google-cloud-node
    GoogleCloudPlatform/google-cloud-java
    GoogleCloudPlatform/google-cloud-python
    
  2. From the project folder, run the retrieveIssues command with the path of the repository list file, path to a location to save the resulting .csv file, desired issue label, and optional alternative issue labels:

    Example:

    labelcat retrieveIssues repos.txt issues.csv "type: bug" -a "bug"
    
  3. Upload the resulting .csv file to your Google Cloud Storage Bucket:

    Example:

    gsutil cp repos.txt gs://YOUR_PROJECT_ID-lcm/
    

    replacing YOUR_PROJECT_ID with your GCP project ID.

Create Dataset

  1. From the project folder, run the createDataset command with the name of the dataset to create.

    Example:

    labelcat createDataset TestData
    

List Datasets

  1. Run listDataset to return a list of all AutoML NL datasets for the Google Cloud Platform project.

    Example:

    labelcat listDatasets
    

Import Data

  1. Run importData using the Dataset ID returned by the createDataset command and the URI to the issue data .csv file.

    Example:

    labelcat importData gs://YOUR_PROJECT_ID-lcm/issues.csv 123ABCD456789
    

    replacing YOUR_PROJECT_ID with your GCP project ID.

Create Model

  1. Run createModel using the Dataset ID and the name of the model to be created.

    Example:

    labelcat createModel 123ABCD456789 firstModel
    

Contributing

See CONTRIBUTING.

License

Copyright 2018, Google, Inc.

Licensed under the Apache License, Version 2.0

See LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].