All Projects → nokia → time-series-data-collector

nokia / time-series-data-collector

Licence: BSD-3-Clause license
Time series data collector / exporter

Programming Languages

java
68154 projects - #9 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to time-series-data-collector

Lindb
LinDB is a scalable, high performance, high availability distributed time series database.
Stars: ✭ 2,105 (+16092.31%)
Mutual labels:  time-series-database
Timescaledb
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Stars: ✭ 12,211 (+93830.77%)
Mutual labels:  time-series-database
tsfile
THIS REPO HAS MOVED TO https://github.com/apache/incubator-iotdb. TsFile is a columnar file format designed for time-series data, which supports efficient compression and query. It is easy to integrate TsFile with your IOT big data processing frameworks.
Stars: ✭ 105 (+707.69%)
Mutual labels:  time-series-database
awesome-time-series
Resources for working with time series and sequence data
Stars: ✭ 178 (+1269.23%)
Mutual labels:  time-series-database
cnosdb
An Open Source Distributed Time Series Database with high performance, high compression ratio and high usability.
Stars: ✭ 858 (+6500%)
Mutual labels:  time-series-database
Cassandra-Data-Modeling
Basic Rules of Cassandra Data Modeling
Stars: ✭ 29 (+123.08%)
Mutual labels:  time-series-database
ticktock
An OpenTSDB-like time series database, with much better performance.
Stars: ✭ 34 (+161.54%)
Mutual labels:  time-series-database

Time series data collector

Generic time series data collector / exporter.
Currently, only Prometheus is available as a data source.

It offers the following features:

  • Get time series from a start timestamp to an end point.
  • Get the last X seconds of a time series.
  • Search for a time series of a finite size anywhere in the past.
  • Get time series considering only a start point.
  • Export the time series in JSON or CSV files via a shared volume or web services.

One of the main advantages of these features is that they don't have any size limit for the time series. If the time series database has its own limit, TSDC will split the queries automatically and reconstruct the whole time series.

The solution is containerized.

Getting started

Configure config.json

First, you need to set your data source -> resources/config.json

  • datasource.type: The datasource you want to use. Currently, only the prometheus value is available.
  • datasource.srvaddress: The IP:port of the datasource.
  • historytime: The limit to get the data backward.
  • maxduration: The limit to get the data forward.
  • output.json: The output for the generated json files. If empty, the json files generation is disabled.
  • output.csv: The output for the generated csv files. If empty, the csv files generation is disabled.

Build the Doker image

docker build --tag=time-series-data-collector .

Run the docker image

The data file generation is enabled by default. To disable it, empty output.json and output.csv field in the config.json file.

If you enabled the file generation (json/csv)

docker run -v /host/tsdc-data:/opt/tsdc-data --net=host time-series-data-collector

The tsdc-data directory will be generated into your /host, with 2 subdirectories: json and csv.
One json file will be generated for each time series, but only one csv file will be generated for a group of time series from the same query.

If you disabled the file generation

docker run --net=host time-series-data-collector

API

Method Path Description
GET /collector/service/get_ts Get one or many time series via a query (e.g Prometheus query)

Parameters

Parameter Required Description
query YES The data source query (e.g Prometheus query). It's better to encode it before to use it into an URL.
id NO It's possible to set a custom ID in order to name the generated files. If not used, the ID will be auto-generated.
start NO The start timestamp (in seconds) of the time series. If not used, the collector will get the historytime last seconds of data
end NO Works only if the start parameter is used. Gets the time series from start to end. if not used, the collector will get the data from start to start+maxduration.
historytime NO Overrides the historytime parameter in the config.json.
reducehttprequests NO Useful or not considering the use-cases. If you're getting continuous time series, this parameter is useless. However, if you're looking for an isolated time series in your data source (e.g a build in a CI context), it will do only the minimum necessary http requests. Enabled by default.

Example

Here's a Prometheus data source with 5 time series: Prometheus data source
In this example the Prometheus query is very simple, we just get the 5 time series:

{__name__=~"cpu1|cpu2|memory|bandwidth|score"}

Same query, encoded for URLs:

%7B__name__%3D~"cpu1%7Ccpu2%7Cmemory%7Cbandwidth%7Cscore"%7D&

To get all time series:
http://35.180.145.79/tsdc/api/get-ts?query=%7B__name__%3D~%22cpu1%7Ccpu2%7Cmemory%7Cbandwidth%7Cscore%22%7D&
Here, the historytime parameter is set to 14400 in the config.json, in order to get the 4 last hours of data for each time series

Same query, but to get only the 200 last data points:
http://35.180.145.79/tsdc/api/get-ts?query=%7B__name__%3D~%22cpu1%7Ccpu2%7Cmemory%7Cbandwidth%7Cscore%22%7D&&historytime=200

Set up SSL

This solution is able to handle SSL connections.
The only thing you need to do is to:

  • Generate your own keystore.jks
  • Set the parameters in src/main/java/com/nokia/as/main/jetty/JettyConfig.java
  • Add the port to expose in the Dockerfile (at the EXPOSE line)

How it works

TSDC is based on range queries.
If the range is too big, the queries will be splitted automatically.
If one or both of the edges of the range are missing, here's the following cases:

  • Only the start time is set
    • The http request optimizer is enabled:
      It will get the data until the maximum value assigned in the configuration file is reached. It the time series seems to be finished (e.g a build time series), the connections will stop.
    • The http request optimizer is disabled:
      It will get the data until the maximum value assigned in the configuration file is reached, no matter what the time series looks like.
  • No edge is set
    • The http request optimizer is enabled:
      It will try get the data backward from the current timestamp. If there's no current data point, it will search for it until the hisory time assigned in the configuration file is reached. It it finds it, it will get the data backward. It the time series seems to be finished (e.g a build time series), the connections will stop.
    • The http request optimizer is disabled:
      Same thing, but if it finds a time series, it will continue to get the data backward, no matter what the time series looks like.

Architecture schema

tsdc-schema

Icons made by Smashicons, DinosoftLabs, Pixel Buddha from www.flaticon.com is licensed by CC 3.0 BY

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].