All Projects → Sqooba → scala-timeseries-lib

Sqooba / scala-timeseries-lib

Licence: Apache-2.0 License
Lightweight, functional and correct time-series library for scala. Easy manipulation, filtering and combination of time-series data.

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to scala-timeseries-lib

Sweep
Extending broom for time series forecasting
Stars: ✭ 143 (+361.29%)
Mutual labels:  time, timeseries
Timetk
A toolkit for working with time series in R
Stars: ✭ 371 (+1096.77%)
Mutual labels:  time, timeseries
dtw-python
Python port of R's Comprehensive Dynamic Time Warp algorithms package
Stars: ✭ 139 (+348.39%)
Mutual labels:  time, timeseries
Brein Time Utilities
Library which contains several time-dependent data and index structures (e.g., IntervalTree, BucketTimeSeries), as well as algorithms.
Stars: ✭ 94 (+203.23%)
Mutual labels:  time, timeseries
Tibbletime
Time-aware tibbles
Stars: ✭ 175 (+464.52%)
Mutual labels:  time, timeseries
modeltime.ensemble
Time Series Ensemble Forecasting
Stars: ✭ 65 (+109.68%)
Mutual labels:  time, timeseries
TASNET
Time-domain Audio Separation Network (IN PYTORCH)
Stars: ✭ 18 (-41.94%)
Mutual labels:  time, domain
cftime
Time-handling functionality from netcdf4-python.
Stars: ✭ 53 (+70.97%)
Mutual labels:  time
vue-timeselector
🕒 Simply customizable powerful time picker for Vue.js
Stars: ✭ 41 (+32.26%)
Mutual labels:  time
spoofpoint
Spoofpoint is a domain monitoring tool that allows you to generate a list of domains that are 1 character off of your domain (grahamhelton.com turns into -> grahamheIton.com ((The L is a capital I )), check a list of domains you already have, or check as single domain.
Stars: ✭ 25 (-19.35%)
Mutual labels:  domain
canon-generator
Create and visualize temporal canons a'la Conlon Nancarrow
Stars: ✭ 31 (+0%)
Mutual labels:  time
ARCHModels.jl
A Julia package for estimating ARMA-GARCH models.
Stars: ✭ 63 (+103.23%)
Mutual labels:  timeseries
Yort.Ntp
A cross platform NTP client library for .Net platforms. Allows you to easily retrieve an accurate, current date & time from internet NTP servers.
Stars: ✭ 35 (+12.9%)
Mutual labels:  time
moment-cache
⏱ Simple utility to cache moment.js results and speed up moment calls.
Stars: ✭ 29 (-6.45%)
Mutual labels:  time
dogma
Things and stuffs.
Stars: ✭ 22 (-29.03%)
Mutual labels:  time
hastic
Hastic standalone
Stars: ✭ 37 (+19.35%)
Mutual labels:  timeseries
appointment-picker
Appointment Picker - a tiny JavaScript timepicker that helps you pick appointments
Stars: ✭ 49 (+58.06%)
Mutual labels:  time
humanize time
Adds the humanize method to reports the approximate distance in time between two Time. humanize supports i18n translations too so it can be used in internationalized apps.
Stars: ✭ 20 (-35.48%)
Mutual labels:  time
MD DS3231
DS3231 Real Time Clock Library
Stars: ✭ 29 (-6.45%)
Mutual labels:  time
rktree.cljc
Trees where leaves are located both in time and space
Stars: ✭ 15 (-51.61%)
Mutual labels:  time

scala-timeseries-lib

Build Status Coverage Status

Lightweight, functional and exact time series library for scala

See the microsite for more information and documentation.

TL;DR

// https://mvnrepository.com/artifact/io.sqooba.oss/scala-timeseries-lib
libraryDependencies += "io.sqooba.oss" %% "scala-timeseries-lib" % "1.1.0"

or, if you want to cook your own, local, HEAD-SNAPSHOT release, just

make release-local

Alternatively, if you want to set a specific version when installing locally:

make -e VERSION=1.0.0 release-local

Usage

In essence, a TimeSeries is just an ordered map of [Long,T]. In most use cases the key represents the time since the epoch in milliseconds, but the implementation makes no assumption about the time unit of the key.

Defining a TimeSeries

The TimeSeries trait has a default implementation: VectorTimeSeries[T], referring to the underlying collection holding the data. There are other implementations as well.

val ts = TimeSeries(Seq(
  TSEntry(1000L, "One",  1000L),   // String 'One' lives at 1000 on the timeline and is valid for 1000.
  TSEntry(2000L, "Two",  1000L),
  TSEntry(4000L, "Four", 1000L)
))

ts now defines a time series of String that is defined on the interval [1000,5000[, with a hole at [3000,4000[

The TimeSeries.apply contstructor is quite expensive because it sorts the entries to ensure a sane series. Usually, the input is already sorted. In that case there are two other constructors:

  • TimeSeries.ofOrderedEntriesSafe: This checks whether the entries are in correct order and trims them so as to not overlap. It can optionally compress the entries.

  • TimeSeries.ofOrderedEntriesUnsafe: This doesn't do any checks, trims or compression on the data and just wraps them in a time series.

  • TimeSeries.newBuilder: returns a builder to incrementally build a new time series.

Querying

The simplest function exposed by a time series is at(t: Long): Option[T]. With ts defined as above, calling at() yields the following results:

    ts.at(999)  // None
    ts.at(1000) // Some("One")
    ts.at(1999) // Some("One")
    ts.at(2000) // Some("Two")
    ts.at(3000) // None
    ts.at(3999) // None
    ts.at(4000) // Some("Four")
    ts.at(4999) // Some("Four")
    ts.at(5000) // None

Basic Operations

TimeSeries of any Numeric type come with basic operators you might expect for such cases:

val tsa = TimeSeries(Seq(
  TSEntry(0L,  1.0, 10L),
  TSEntry(10L, 2.0, 10L)
)
val tsb = TimeSeries(Seq(
  TSEntry(0L,  3.0, 10L),
  TSEntry(10L, 4.0, 10L)
)

tsa + tsb // (TSEntry(0, 4.0, 10L), TSEntry(10, 6.0, 10L))
tsa * tsb // (TSEntry(0, 3.0, 10L), TSEntry(10, 8.0, 10L))

Note that there are a few quirks to be aware of when a TimeSeries has discontinuities: please refer to function comments in NumericTimeSeries.scala for more details.

Custom Operators: time series Merging

For non-numeric time series, or for any particular needs, a TimeSeries can be merged using an arbitrary merge operator: op: (Option[A], Option[B]) => Option[C]. For example (this method is already defined for you in the interface, no need to rewrite it):

def plus(aO: Option[Double], bO: Option[Double]) =
  (aO, bO) match {
    // Wherever both time series share a defined domain, return the sum of the values
    case (Some(a), Some(b)) => Some(a+b)
    // Wherever only a single time series is defined, return the defined value
    case (Some(a), None) => aO
    case (None, Some(b)) => bO
    // Where none of the time series are defined, the result remains undefined.
    case _ => None
  }

For a complete view of what you can do with a TimeSeries, the best is to have a look at the TimeSeries.scala interface.

Under the hood

While a TimeSeries[T] looks a lot like an ordered Map[Long,T], it should more be considered like an ordered collection of triples of the form (timestamp: Long, value: T, validity: Long) (called a TSEntry[T] internally), representing small, constant, time series chunks.

Essentially, it's a step function.

Notes on Performance

The original goal was to provide abstractions that are easy to use and to understand.

While we still strive to keep the library simple to use, we are also shifting to more intensive applications: performance is thus becoming more of a priority.

Details

As suggested by its name, VectorTimeSeries is backed by a Vector and uses dichotomic search for lookups. The following performances can thus be expected (using the denomination found here):

  • Log for random lookups, left/right trimming and slicing within the definition bounds
  • eC (effectively constant time) for the rest (appending, prepending, head, last, ...)

Each data point is however represented by an object, which hurts memory usage. Therefore there is a second implementation: ColumnTimeSeries which represents its entries with a column-store of three vectors (Vector[Long], Vector[T], Vector[Long]). This should save space for primitive types.

You can create a ColumnTimeSeries with its builder ColumnTimeSeries.newBuilder.

Misc

Why

I've had to handle time series like data in Java in the past, which turned out to be slightly really frustrating.

Having some spare time and wanting to see what I could come up with in Scala, I decided to build a small time series library. Additional reasons are:

  • It's fun
  • There seems to be no library doing something like that out there
  • I wanted to write some Scala again.

Since then, we began using this for smaller projects at Sqooba and maintenance has officially been taken over in May 2019.

TODOS

  • stream/lazy-collections implementation
  • more tests for non-trivial merge operators
  • benchmarks to actually compare various implementations
  • make it easy to use from Java
  • consider https://scalacheck.org/ for property-based testing?
  • interoperability with something like Apache Arrow?

Publish the microsite

You need write access on the Github repo to push the microsite. The site is built entirely with sbt plugins and lives on the gh-pages branch. You can edit it in /docs. Scaladoc is automatically created.

In order to build the site locally you need sbt version 1.3.3+ and jekyll version 3.8.5+ (the installation on macOS is a bit tricky because or ruby.)

Once you have this, you can

sbt makeMicrosite && jekyll serve -s target/site

You can now visit the site under http://localhost:4000/scala-timeseries-lib. To publish, just

sbt publishMicrosite

If you are having trouble with an error like this: fatal: not a git repository , check that.

Contributions

First and foremost: contributions are more than welcome!

We manage this library on an internal repository, which gets synced to github. However, we are able to support the classic github PR workflow, so you should normally be able to ignore our setup's particularities.

Contributors

Changelog

Please refer to the CHANGELOG

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].