Skip to contents

Summary

This article builds upon Benito and Birks 2020 to describe in detail the computation of the dissimilarity metric used in distantia to compare pairs of time series. This content aims to provide a deep understanding on how dissimilarity is computed for different types of time series and purposes by mixing theoretical details with practical examples.

The Psi Dissimilarity Metric

The Psi Dissimilarity Metric (ψ\psi hereafter) measures the distance between two time series in the range [0,)[0, \infty), where 0 represents identical time series.

In essence, ψ\psi is two times the sum of distances between all relevant pairs of samples of two time series, normalized by the sum of distances between consecutive samples in both time series.

The general expression to compute ψ\psi can be simplified to:

ψ=2×D(X,Y)d(X)+d(Y)1\psi = \frac{2 \times D(X, Y)}{d(X) + d(Y)} - 1

where:

  • XX and YY: time series matrices with the same number of columns.
  • DD: function to sum distances between pairs of rows from each matrix.
  • dd: function to sum distances between consecutive rows in a matrix.
  • 2×D(X,Y)2 \times D(X, Y): distance between time series (named ABbetweenAB_{between} in Benito and Birks 2020).
  • d(X)+d(Y)d(X) + d(Y): normalization factor (named ABwithinAB_{within} in the original paper).

Going deeper into the math notation, ABbetweenAB_{between} is expressed as:

ABbetween=2×i=1m=nD(Ai,Bi)AB_{\text{between}} = 2 \times \sum_{i=1}^{m=n} D(A_i, B_i) where mm and nn are the number of samples in AA and BB, ii is the relevant sample index, nn is the total number of sample pairs, and DD is a distance function (e.g., Euclidean, Manhattan, or any other relevant distance metric for the values in AA and BB).

On the other hand, ABwithinAB_{within} is represented as:

ABwithin=i=2mD(Ai1,Ai)+i=2nD(Bi1,Bi)AB_{\text{within}} = \sum_{i=2}^{m} D(A_{i-1}, A_i) + \sum_{i=2}^{n} D(B_{i-1}, B_i) where D(Ai1,Ai)D(A_{i-1}, A_i) and D(Bi1,Bi)D(B_{i-1}, B_i) represent the sum of distances between consecutive samples within AA and BB, respectively.