Timely beliefs

Model data as beliefs (at a certain time) about events (at a certain time).

The timely-beliefs package provides a convenient data model for numerical time series, that is both simple enough for humans to understand and sufficiently rich for forecasting and machine learning. The data model is an extended pandas DataFrame that assigns properties and index levels to describe:

Getting started (or try one of the other ways to create a BeliefsDataFrame):

>>> import timely_beliefs as tb
>>> df = tb.BeliefsDataFrame([tb.TimedBelief(tb.Sensor("Indoor temperature", "°C"), tb.BeliefSource("Thermometer"), 21, event_time="2000-03-05 11:00Z", belief_horizon="0H")])
>>> print(df)
                                                                                        event_value
event_start               belief_time               source      cumulative_probability             
2000-03-05 11:00:00+00:00 2000-03-05 11:00:00+00:00 Thermometer 0.5                              21

The package contains the following functionality:

A model for time series data, suitable for a notebook or a database-backed program (using sqlalchemy)
Selecting/querying beliefs, e.g. those held at a certain moment in time
Computing accuracy, e.g. against after-the-fact knowledge, also works with probabilistic forecasts
Resampling time series with uncertainty (experimental)
Visualising time series and accuracy metrics (experimental)

Some use cases of the package:

Clearly distinguish forecasts from rolling forecasts.
Analyse your predictive power by showing forecast accuracy as you approach an event.
Learn when someone is a bad predictor.
Evaluate the risk of being wrong about an event.

Check out our interactive demonstration comparing forecasting models for renewable energy production. These visuals are created simply by calling the plot method on our BeliefsDataFrame, using the visualisation library Altair.

The data model
Database storage
Accuracy
Visualisation
Development

The data model

The BeliefsDataFrame is the basic data model that represents data as probabilistic beliefs about events. It is an extended pandas DataFrame with the following index levels:

event_start; keeping track of the time of whatever it is that the data point describes (an event)
belief_time; keeping track of the time at which the data point was created (a belief)
source; keeping track of who or what created the data point (a source)
cumulative_probability; keeping track of the confidence in the data point (a probability)

Together these index levels describe data points as probabilistic beliefs. Because of the sparse representation of index levels (a clever default setting in pandas) we get clean-looking data, as we show here in a printout of the example BeliefsDataFrame in our examples module:

>>> import timely_beliefs
>>> df = timely_beliefs.examples.get_example_df()
>>> df.head(8)
                                                                                     event_value
event_start               belief_time               source   cumulative_probability
2000-01-03 09:00:00+00:00 2000-01-01 00:00:00+00:00 Source A 0.1587                           90
                                                             0.5000                          100
                                                             0.8413                          110
                                                    Source B 0.5000                            0
                                                             1.0000                          100
                          2000-01-01 01:00:00+00:00 Source A 0.1587                           99
                                                             0.5000                          100
                                                             0.8413                          101

The first 8 entries of this BeliefsDataFrame show beliefs about a single event. Beliefs were formed by two distinct sources (A and B), with the first updating its beliefs at a later time. Source A first thought the value of this event would be 100 ± 10 (the probabilities suggest a normal distribution), and then increased its accuracy by lowering the standard deviation to 1. Source B thought the value would be equally likely to be 0 or 100.

More information about what actually constitutes an event is stored as metadata in the BeliefsDataFrame. The sensor property keeps track of invariable information such as the unit of the data and the resolution of events.

>>> df.sensor
<Sensor: Sensor 1>

Currently a BeliefsDataFrame contains data about a single sensor only. For a future release we are considering adding the sensor as another index level, to offer out-of-the-box support for aggregating over multiple sensors.

Database storage

All of the above can be done with TimedBelief objects in a BeliefsDataFrame. However, if you are dealing with a lot of data and need performance, you'll want to persist your belief data in a database.

Accuracy

The accuracy of a belief is defined with respect to some reference. The default reference is the most recent belief held by the same source, but it is possible to set beliefs held by a specific source at a specific time to serve as the reference instead.

There are two common use cases for wanting to know the accuracy of beliefs, each with a different viewpoint. With a rolling viewpoint, you get the accuracy of beliefs at a certain belief_horizon before (or after) knowledge_time, for example, some days before each event ends.

>>> df.rolling_viewpoint_accuracy(timedelta(days=2, hours=9), reference_source=df.lineage.sources[0])
                 mae      mape      wape
source
Source A    1.482075  0.014821  0.005928
Source B  125.853250  0.503413  0.503413

With a fixed viewpoint, you get the accuracy of beliefs held at a certain belief_time.

>>> df.fixed_viewpoint_accuracy(datetime(2000, 1, 2, tzinfo=utc), reference_source=df.lineage.sources[0])
                mae      mape      wape
source
Source A    0.00000  0.000000  0.000000
Source B  125.85325  0.503413  0.503413

For an intuitive representation of accuracy that works in many cases, we suggest to use:

>>> `df["accuracy"] = 1 - df["wape"]`

A more detailed discussion of accuracy and error metrics can be found here.

Visualisation

Create interactive charts using Altair and view them in your browser.

>>> chart = df.plot(reference_source=df.lineage.sources[0], show_accuracy=True)
>>> chart.serve()

This will create an interactive Vega-Lite chart like the one in the screenshot at the top of this Readme.

Development

The timely_beliefs package runs on pandas>=1.1.5. Contact us if you need support for older versions. We welcome other contributions to timely_beliefs.

See our developer docs for details.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

SeitaBV / timely-beliefs

Programming Languages

Labels