Time series data collector

Generic time series data collector / exporter.
Currently, only Prometheus is available as a data source.

It offers the following features:

Get time series from a start timestamp to an end point.
Get the last X seconds of a time series.
Search for a time series of a finite size anywhere in the past.
Get time series considering only a start point.
Export the time series in JSON or CSV files via a shared volume or web services.

One of the main advantages of these features is that they don't have any size limit for the time series. If the time series database has its own limit, TSDC will split the queries automatically and reconstruct the whole time series.

The solution is containerized.

Getting started

Configure config.json

First, you need to set your data source -> resources/config.json

datasource.type: The datasource you want to use. Currently, only the prometheus value is available.
datasource.srvaddress: The IP:port of the datasource.
historytime: The limit to get the data backward.
maxduration: The limit to get the data forward.
output.json: The output for the generated json files. If empty, the json files generation is disabled.
output.csv: The output for the generated csv files. If empty, the csv files generation is disabled.

Build the Doker image

docker build --tag=time-series-data-collector .

Run the docker image

The data file generation is enabled by default. To disable it, empty output.json and output.csv field in the config.json file.

If you enabled the file generation (json/csv)

docker run -v /host/tsdc-data:/opt/tsdc-data --net=host time-series-data-collector

The tsdc-data directory will be generated into your /host, with 2 subdirectories: json and csv.
One json file will be generated for each time series, but only one csv file will be generated for a group of time series from the same query.

If you disabled the file generation

docker run --net=host time-series-data-collector

API

Method	Path	Description
GET	/collector/service/get_ts	Get one or many time series via a query (e.g Prometheus query)

Parameters

Parameter	Required	Description
query	YES	The data source query (e.g Prometheus query). It's better to encode it before to use it into an URL.
id	NO	It's possible to set a custom ID in order to name the generated files. If not used, the ID will be auto-generated.
start	NO	The start timestamp (in seconds) of the time series. If not used, the collector will get the historytime last seconds of data
end	NO	Works only if the start parameter is used. Gets the time series from start to end. if not used, the collector will get the data from start to start+maxduration.
historytime	NO	Overrides the historytime parameter in the config.json.
reducehttprequests	NO	Useful or not considering the use-cases. If you're getting continuous time series, this parameter is useless. However, if you're looking for an isolated time series in your data source (e.g a build in a CI context), it will do only the minimum necessary http requests. Enabled by default.

Example

Here's a Prometheus data source with 5 time series: Prometheus data source
In this example the Prometheus query is very simple, we just get the 5 time series:

{__name__=~"cpu1|cpu2|memory|bandwidth|score"}

Same query, encoded for URLs:

%7B__name__%3D~"cpu1%7Ccpu2%7Cmemory%7Cbandwidth%7Cscore"%7D&

To get all time series:
http://35.180.145.79/tsdc/api/get-ts?query=%7B__name__%3D~%22cpu1%7Ccpu2%7Cmemory%7Cbandwidth%7Cscore%22%7D&
Here, the historytime parameter is set to 14400 in the config.json, in order to get the 4 last hours of data for each time series

Same query, but to get only the 200 last data points:
http://35.180.145.79/tsdc/api/get-ts?query=%7B__name__%3D~%22cpu1%7Ccpu2%7Cmemory%7Cbandwidth%7Cscore%22%7D&&historytime=200

Set up SSL

This solution is able to handle SSL connections.
The only thing you need to do is to:

Generate your own keystore.jks
Set the parameters in src/main/java/com/nokia/as/main/jetty/JettyConfig.java
Add the port to expose in the Dockerfile (at the EXPOSE line)

How it works

TSDC is based on range queries.
If the range is too big, the queries will be splitted automatically.
If one or both of the edges of the range are missing, here's the following cases:

Only the start time is set
- The http request optimizer is enabled:
  It will get the data until the maximum value assigned in the configuration file is reached. It the time series seems to be finished (e.g a build time series), the connections will stop.
- The http request optimizer is disabled:
  It will get the data until the maximum value assigned in the configuration file is reached, no matter what the time series looks like.
No edge is set
- The http request optimizer is enabled:
  It will try get the data backward from the current timestamp. If there's no current data point, it will search for it until the hisory time assigned in the configuration file is reached. It it finds it, it will get the data backward. It the time series seems to be finished (e.g a build time series), the connections will stop.
- The http request optimizer is disabled:
  Same thing, but if it finds a time series, it will continue to get the data backward, no matter what the time series looks like.

Architecture schema

Icons made by Smashicons, DinosoftLabs, Pixel Buddha from www.flaticon.com is licensed by CC 3.0 BY

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

nokia / time-series-data-collector

Programming Languages

Labels

Projects that are alternatives of or similar to time-series-data-collector