The Psi Dissimilarity Metric
Source:vignettes/articles/psi_dissimilarity_metric.Rmd
psi_dissimilarity_metric.Rmd
Summary
This article builds upon Benito and Birks 2020 to
describe in detail the computation of the dissimilarity metric used in
distantia
to compare pairs of time series. This content
aims to provide a deep understanding on how dissimilarity is computed
for different types of time series and purposes by mixing theoretical
details with practical examples.
The Psi Dissimilarity Metric
The Psi Dissimilarity Metric ( hereafter) measures the distance between two time series in the range , where 0 represents identical time series.
In essence, is two times the sum of distances between all relevant pairs of samples of two time series, normalized by the sum of distances between consecutive samples in both time series.
The general expression to compute can be simplified to:
where:
- and : time series matrices with the same number of columns.
- : function to sum distances between pairs of rows from each matrix.
- : function to sum distances between consecutive rows in a matrix.
- : distance between time series (named in Benito and Birks 2020).
- : normalization factor (named in the original paper).
Going deeper into the math notation, is expressed as:
where and are the number of samples in and , is the relevant sample index, is the total number of sample pairs, and is a distance function (e.g., Euclidean, Manhattan, or any other relevant distance metric for the values in and ).
On the other hand, is represented as:
where and represent the sum of distances between consecutive samples within and , respectively.