All Projects → ciconia → pmts

ciconia / pmts

Licence: MIT license
Poor man's time series functionality for PostgreSQL

Programming Languages

PLpgSQL
1095 projects

Projects that are alternatives of or similar to pmts

mf-nav-data
Historical NAV/price/time-series data of mutual funds and popular benchmark indices in India
Stars: ✭ 29 (-6.45%)
Mutual labels:  time-series
go-metrics-wavefront
Wavefront plugin for go-metrics
Stars: ✭ 12 (-61.29%)
Mutual labels:  time-series
DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
Stars: ✭ 44 (+41.94%)
Mutual labels:  time-series
walker
Bayesian Generalized Linear Models with Time-Varying Coefficients
Stars: ✭ 38 (+22.58%)
Mutual labels:  time-series
DTW
Dynamic Time Warping in Python / C (using ctypes)
Stars: ✭ 26 (-16.13%)
Mutual labels:  time-series
ww tvol study
Process global-scale satellite and airborne elevation data into time series of glacier mass change: Hugonnet et al. (2021).
Stars: ✭ 26 (-16.13%)
Mutual labels:  time-series
SCINet
Forecast time series and stock prices with SCINet
Stars: ✭ 28 (-9.68%)
Mutual labels:  time-series
phpPgAdmin6
PHP7+ Based administration tool for PostgreSQL 9.3+
Stars: ✭ 45 (+45.16%)
Mutual labels:  plpgsql
mlforecast
Scalable machine 🤖 learning for time series forecasting.
Stars: ✭ 96 (+209.68%)
Mutual labels:  time-series
forecasting models
An overview of univariate time series forecasting models with sample code.
Stars: ✭ 39 (+25.81%)
Mutual labels:  time-series
microprediction
If you can measure it, consider it predicted
Stars: ✭ 158 (+409.68%)
Mutual labels:  time-series
cnn-rnn-bitcoin
Reusable CNN and RNN model doing time series binary classification
Stars: ✭ 28 (-9.68%)
Mutual labels:  time-series
gold-price-analysis
Creating a model to analyze and predict the trend of the prices of gold.
Stars: ✭ 31 (+0%)
Mutual labels:  time-series
Shapley regressions
Statistical inference on machine learning or general non-parametric models
Stars: ✭ 37 (+19.35%)
Mutual labels:  time-series
lightweight-temporal-attention-pytorch
A PyTorch implementation of the Light Temporal Attention Encoder (L-TAE) for satellite image time series. classification
Stars: ✭ 43 (+38.71%)
Mutual labels:  time-series
ewstools
Python package for early warning signals (EWS) of bifurcations in time series data.
Stars: ✭ 29 (-6.45%)
Mutual labels:  time-series
LSTM-Time-Series-Analysis
Using LSTM network for time series forecasting
Stars: ✭ 41 (+32.26%)
Mutual labels:  time-series
IMCtermite
Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats
Stars: ✭ 20 (-35.48%)
Mutual labels:  time-series
tsfuse
Python package for automatically constructing features from multiple time series
Stars: ✭ 33 (+6.45%)
Mutual labels:  time-series
price-optimization-shiny
Shiny app for Price Optimization using prophet and lme4 libraries for R.
Stars: ✭ 28 (-9.68%)
Mutual labels:  time-series

PMTS - Poor man's time series functionality for PostgreSQL

PMTS is a collection of tools for working with time-series data in PostgreSQL written in SQL and PL/pgSQL, without needing to install extensions or work with outside tools. Its features include:

  • Automatic partitioning (sharding) of time-series tables by time range.
  • Automatic dropping of old partitions according to data retention settings.
  • Aggregation and summarizing utilities (WIP).

PMTS delivers some of the principal benefits of using tools such as TimeScaleDB or CitusDB on a stock PostgreSQL setup. PMTS has been employed successfuly in a ~2TB production database with hundreds of tables and billions of records.

Note: A new version of PMTS, using declarative table partitioning (supported on PostgreSQL 10.x+) is currently under development. The following documentation relates to PMTS 1.0, which works with PostgreSQL 9.x+. The source code for PMTS 1.0 is in pmts.pg9.sql.

Installing PMTS

$ git clone https://github.com/ciconia/pmts.git
$ cd pmts
$ psql -d mydb -f pmts.pg9.sql

Getting Started

After installing PMTS, create a time series table and start working with it:

-- create our time-series table
create table measurements (sensor text, stamp timestamptz, value numeric);

-- setup partitioning with per-week partitioning, retaining data for one year.
-- we also provide any index fields in order for PMTS to setup an index on
-- relevant columns. in this case, PMTS will create an index on (sensor, stamp).
select pmts_setup_partitions('measurements', 86400 * 7, 86400 * 365, '{sensor}');

-- continue to work normally with our table
insert into measurements values ('temp', now(), 12.345);

select stamp, value
from measurements
where sensor = 'temp' and stamp >= now() - interval '1 month';

Once partitioning is setup for your table, any inserts will automatically be redirected to the right partition, and partitions will be automatically created.

Removing old data

PMTS makes it easy to drop old data by defining a retention period for each partitioned table. Instead of deleting old records, PMTS drops entire partitions, making the process much less error-prone, and also much faster. To drop old data, just call the pmts_drop_old_partitions() function periodically (say, once a day):

select pmts_drop_old_partitions();

Tuning partition sizes

PMTS also includes an API for automatically tuning partition sizes. The pmts_tune_partition_size looks at existing partitions and tunes the partition time span so as to produce a certain byte size, within the given limits. You need to provide three arguments: the desired byte size for partitions, the minimum and maximum time spans for partitions (in days), e.g.:

-- aim for 1GB partitions, with a minimum of 30 days and maximum of 365 days
-- per partition.
select pmts_tune_partition_size(1000000000, 30, 365);

API

The PMTS API is made of PL/pgSQL functions. Those functions can be called just like any built-in function using select:

[dbuser]=> select pmts_setup_partitions(...);

You can also invoke these functions from your shell:

$ psql -c "select pmts_setup_partitions(...);"

pmts_setup_partitions(tbl_name, partition_size, retention_period, index_fields)

Sets up partitioning for the specified table. PMTS will use the supplied arguments to control partition creation and deletion. Partitions are created on the fly by using an insert trigger. The trigger will create partitions as needed and insert records into the correct partition.

Arguments

Name Type Description
tbl_name TEXT Identifier of table to be partitioned
partition_size INTEGER The partition size in seconds
retention_period INTEGER The retention period in seconds
index_fields TEXT[] An array of fields to use for indexing

PMTS partitions tables by time ranges. The partition_size argument controls the size of each partitions in terms of time range. The retention_period argument is used to control how the amount of time to retain old data.

The index_fields argument is used to control the index created automatically by PMTS. By default, if no index fields are specified, PMTS will create an index on the stamp field for each created partition. In most cases, though, A compound index will be more useful, as usually data is filtered not only by time range but also by one or more dimensions in other columns.

For example, consider the following scenario:

create table measurements (unit text, metric text, stamp timestamptz, value numeric);

In such a case, we'll usually query by unit, metric and stamp. We therefore pass the relevant columns to PMTS:

select pmts_setup_partitions ('measurements', 86400 * 7, 86400 * 365, '{unit, metric}');

PMTS will then create an index on (unit, metric, stamp) for each partition.

pmts_drop_table(tbl_name)

Drops a table that was previously partitioned with pmts_setup_partitions, removing information about its partitions from the pmts tables. This will remove all partitions.

Arguments

Name Type Description
tbl_name TEXT Table identifier

pmts_drop_old_partitions()

Drops old partitions according to retention period specified for each table. This function should be called periodically to remove old partitions. Use your favorite tool to setup a recurring job that invokes the function.

pmts_total_size(tbl_name)

Returns the sum of total relation size for all partitions of the specified table using pg_total_relation_size.

Arguments

Name Type Description
tbl_name TEXT Table identifier

pmts_info

A view returning total size and number of partitions for each table managed by PMTS, with the following columns:

Name Type Description
tbl_name TEXT Table identifier
total_size NUMERIC Total size of all table partitions in bytes
partition_count BIGINT Number of existing partitions
avg_size BIGINT Average partition size in bytes
current_partition_size INTEGER Current partition size setting in seconds

Sample usage:

select * from pmts_info where tbl_name = 'measurements';

pmts_ideal_partition_size(desired_byte_size, current_byte_size, current_partition_size, min_days, max_days)

Returns the ideal partition size in seconds for the given arguments.

Arguments

Name Type Description
desired_byte_size BIGINT The desired partition size in bytes
current_byte_size BIGINT The current average partition size in bytes
current_partition_size BIGINT The current partition size in seconds
min_days INT The minimum partition size in days (by default 7)
max_days INT The maximum partition size in days (by default 56)

pmts_tune_partition_size(desired_byte_size, min_days, max_days)

Adjusts the partition size for all PMTS-managed tables according to the given arguments. Note: this function should be invoked only seldomly in order for the current average partition size (which is used to calculate the ideal partition size in seconds) to faithfully reflect the current partition size settings.

Arguments

Name Type Description
desired_byte_size BIGINT The desired partition size in bytes
min_days INT The minimum partition size in days (by default 7)
max_days INT The maximum partition size in days (by default 56)

FAQ

Q: What is a time series table?

A: For PMTS, any table that includes a stamp column, of type timestamp or timestamptz.

Q: Why should I partition (or shard) my time series tables?

A: Partitioning a table allows you to keep your tables to a manageable size and maintain a good insertion rate. It also makes it easier to get rid of old data by simply dropping partitions instead of deleting recores and then vacuuming.

Q: How does partitioning work?

A: For each partitioned table, PMTS maintains a list of partitions and installs an insert trigger that will insert each record into the correct partition according to the time stamp. Partition tables are created using PostgreSQL's table inheritance mechanism. That way, the query planner will automatically include and exclude partition tables according to the time range specified for the query.

Q: What versions of PostgreSQL can be used with PMTS?

A: PMTS has been developed and used on PostgreSQL 9.6. It will most probably not work with anything before 9.0.

Q: How do I install it?

A: Simply download pmts.pg9.sql and load into your database using psql:

$ psql -f pmts.pg9.sql

Q: Does PMTS work in AWS RDS?

A: Yes.

Q: Do I need to install anything besides PostgreSQL for PMTS to work?

A: No.

Contributing

PMTS can be improved in lots of ways - additional features, more customization, more introspection tools... Please feel free to contribute ideas and code!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].