The Psi Dissimilarity Metric

WORK IN PROGRESS

Summary

This article builds upon Benito and Birks 2020 to describe in detail the computation of the dissimilarity metric used in distantia to compare pairs of time series. This content aims to provide a deep understanding on how dissimilarity is computed for different types of time series and purposes by mixing theoretical details with practical examples.

The Psi Dissimilarity Metric ( $\psi$ hereafter) measures the distance between two time series in the range $[0, \infty)$ , where 0 represents identical time series.

In essence, $\psi$ is two times the sum of distances between all relevant pairs of samples of two time series, normalized by the sum of distances between consecutive samples in both time series.

The general expression to compute $\psi$ can be simplified to:

$\psi = \frac{2 \times D(X, Y)}{d(X) + d(Y)} - 1$

where:

$X$ and $Y$ : time series matrices with the same number of columns.
$D$ : function to sum distances between pairs of rows from each matrix.
$d$ : function to sum distances between consecutive rows in a matrix.
$2 \times D(X, Y)$ : distance between time series (named $AB_{between}$ in Benito and Birks 2020).
$d(X) + d(Y)$ : normalization factor (named $AB_{within}$ in the original paper).

Going deeper into the math notation, $AB_{between}$ is expressed as:

$AB_{\text{between}} = 2 \times \sum_{i=1}^{m=n} D(A_i, B_i)$ where $m$ and $n$ are the number of samples in $A$ and $B$ , $i$ is the relevant sample index, $n$ is the total number of sample pairs, and $D$ is a distance function (e.g., Euclidean, Manhattan, or any other relevant distance metric for the values in $A$ and $B$ ).

On the other hand, $AB_{within}$ is represented as:

$AB_{\text{within}} = \sum_{i=2}^{m} D(A_{i-1}, A_i) + \sum_{i=2}^{n} D(B_{i-1}, B_i)$ where $D(A_{i-1}, A_i)$ and $D(B_{i-1}, B_i)$ represent the sum of distances between consecutive samples within $A$ and $B$ , respectively.