Getting Started with SETI@IBMCloud Data Analysis

This document serves as an introductory guide to accessing the SETI data available from SETI@IBMCloud.

Brief Introduction to SETI@IBMCloud Data

The SETI Institute utilizes the Allen Telescope Array (ATA) to search for radio signals from intelligent life beyond our Solar System. Nearly each night, the ATA observes radio frequencies in the ~1-10 GHz frequency range from multiple locations in the sky.

Observation of a potential signal results the following data

two raw data files, either two compamp or two archive-compamp files, depending on signal classification
real-time analysis of the signal, stored as a row in the SignalDB table

On each ATA telescope, the horizontal and vertical polarization components of the radio signal are measured separately. For each polarization, the raw, time-series signals from the entire ATA array are digitized and combined into a single data file. Additionally, the time-series data have been bandpass filtered, meaning the frequencies observed in the data only cover a small range, called the bandwidth. The exact frequency range may be recovered from information found in the header of the raw data file. (The ibmseti python package will calculate this for you, along with reading the data file and providing the necessary signal processing to get you started.)

A SignalDB row contains the conditions and characteristics of the observation, such as the Right Ascension (RA) and Declination (DEC) celestial coordinates of the signal, an estimate of the power of the signal, primary carrier frequency, carrier frequency drift, signal classification, etc. The RA/DEC are the coordinates in the sky of the target being observed (referenced from the J2000 equinox).

The raw data for a signal that is categorized as a Candidate is stored as an archive-compamp file, while all other non-candidate signals are stored as compamp files.

Right now, we are only making the Candidate/archive-compamp files and their associated rows in the SignalDB available. There are 456717 archiv-compamp files currently available, obtained during observations from 2013 to 2015.

Further updates will provide access to the non-candidate compamp files and other ways of retrieving the data.

Basic Intro to Data Access

SignalDB

The new endpoint /v1/aca/meta/all will return the entire SignalDB table in a single CSV file. You should use this file to select your data of interest. Loading this data into a dataframe will make data selection significantly easier and more flexible. Once you've found the subset of data that you find interesting, you can then make queries to find the associated raw data files.

Please see this Notebook for instructions on how to programmatically download, read and query SignalDB within Spark.

The SignalDB data includes signal classifications, metrics such as central frequency, central frequency drift rate, power, signal/noise, etc., with which you can already do some interesting visualization and analysis.

Raw Data Introduction

Before getting into the details of the raw signal data analysis, you can get a peek using just your browser. This is purely for demonstration, however. The HTTP API is designed to be consumed programmatically.

In order to get at the raw data, we ask that you first attain an access token. We do this to track usage of the project. This will, for the moment, require you to create a 30-day free IBM Data Science Experience account. (This will also create a Bluemix account for you automatically during this process.) Even after your free trial period ends, you can still access the SETI data with your access-token.

Get Access Token * https://setigopublic.mybluemix.net/token
Get Raw Data temporary URL * using a container and objectname found in the SignalDB * and using your access_token * Build URL: https://setigopublic.mybluemix.net/v1/data/url/{container}/{objectname}?access_token={access_token} * Example:
- container: setiCompAmp
- objectname: 2013-03-14/act10/2013-03-14_20-37-32_UTC.act10.dx1000.id-0.R.archive-compamp
- access_token: abcdefg1234567890
- URL: https://setigopublic.mybluemix.net/v1/data/url/setiCompAmp/2013-03-14/act10/2013-03-14_20-37-32_UTC.act10.dx1000.id-0.R.archive-compamp?access_token=abcdefg1234567890
Get Raw Data for Candidate E.T. signal * Use the URL returned in the previous step to get the raw archive-compamp file. * Example:
- https://dal.objectstorage.open.softlayer.com/v1/AUTH_cdbef52bdf7a449c96936e1071f0a46b/setiCompAmp/2013-03-14/act10/2013-03-14_20-37-32_UTC.act10.dx1000.id-0.R.archive-compamp?temp_url_sig=2e4e981c7a14b899394e4bde6a9d6d53e238f56b&temp_url_expires=1475256287

Raw Data Analysis

The following set of instructions and notebooks will introduce you to doing analysis with SETI@IBMCloud. We will use

the HTTP API at https://setigopublic.mybluemix.net
IBM Spark and Object Store services
the ibmseti python package

As previously noted, you must create an IBM Data Science Experience account in order to get an access_token to the data. The trial period let's you provision most servies completely free, such as Spark service and Object Store.

Data Science Experience Setup

When you signed up for a DSX account, the onboarding process of DSX should have automatically instantiated a Spark service, an Object Storage service and a new sample project space. This is all you need to get started.

Select Projects from the links at the top of the DSX landing page
If a Default Project does not exist for you:

Create a new project by clicking the button (or click here)
Provide a project name and select the Spark and Object Storage instances available to you.

Within your project, select "add notebook".
To directly import one of the tutorial notebooks, select From URL

Use the URL to a notebook in the GH repo, such as https://github.com/ibm-cds-labs/seti_at_ibm/blob/master/notebooks/ibmseti_get_data_tutorial.ipynb

Optionally, create a new notebook from scratch and copy/paste the python code from the tutorials.

Object Storage Credentials

In order to use the Object Storage instance that is automatically generated in your DSX account programmatically, you will need to obtain the credentials. For example, the second tutorial in this project steps you through the process of saving data to your Object Storage. There are two ways to obtain the credentials.

Feed Object Storage some data

From within any notebook, select the icon near the top right.
If you already have data in your Object Storage, you should see files listed in the side panel.

If there are no data, click the browse button to upload a file.
Choose a small text file for quick upload.
Click Apply

Select an open cell in your notebook.
Under one of your data files, select the Insert to code button and choose Insert Credentials.

Copy from Bluemix

Alternatively, the credentials are found in your Bluemix account, which was created for you automatically when you signed up for DSX.

Log in to https://bluemix.net
Scroll down and click the Object Storage instance you are using in DSX
Select the Service Credentials tab and View Credentials
Copy these into your notebooks where appropriate.

Introduction Notebooks

After you work through these notebooks, you will have used the HTTP API to access the SETI data, saved that data to your IBM Object Storage service, produced a spectrogram from the raw SETI data, and extracted some features, which may be used in a machine-learning analysis.

Contact

We intend for this SETI+IBM collboration to extend to the general public. We want to empower citizen scientists to analyze this data and contribute their analysis and code back to SETI. You can influence future analysis and future observation planning! So, please contact us should you have an interesting analysis. We can provide you with direction, answers to questions, and access to more data!

Contact Info

If you have any problems with the service, submit an Issue or contact me directly.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ibm-watson-data-lab / seti_at_ibm

Programming Languages