Computes distance among pairs of aligned samples in two or more multivariate time-series.

Computes the distance (one of: "manhattan", "euclidean", "chi", or "hellinger") between pairs of aligned samples (same order/depth/age) in two or more multivariate time-series.

distancePairedSamples(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  same.time = FALSE,
  method = "manhattan",
  sum.distances = FALSE,
  parallel.execution = TRUE
  )

Arguments

sequences: dataframe with multiple sequences identified by a grouping column. Generally the ouput of prepareSequences.
grouping.column: character string, name of the column in sequences to be used to identify separates sequences within the file. This argument is ignored if sequence.A and sequence.B are provided.
time.column: character string, name of the column with time/depth/rank data. The data in this column is not modified.
exclude.columns: character string or character vector with column names in sequences, or squence.A and sequence.B to be excluded from the analysis.
same.time: boolean. If TRUE, samples in the sequences to compare will be tested to check if they have the same time/age/depth according to time.column. This argument is only useful when the user needs to compare two sequences taken at different sites but same time frames.
method: character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.
sum.distances: boolean, if TRUE (default option), the distances between samples are summed, and the output of the function (now a list with a single number on each slot) can be directly used as input for the argument least.cost in the function psi.
parallel.execution: boolean, if TRUE (default), execution is parallelized, and serialized if FALSE.

Value

A list with named slots (names of the sequences separated by a vertical line, as in "A|B") containing numeric vectors with the distance between paired samples of every possible combination of sequences according to grouping.column.

Details

Distances are computed as:

manhattan: d <- sum(abs(x - y))
euclidean: d <- sqrt(sum((x - y)^2))
chi: xy <- x + y y. <- y / sum(y) x. <- x / sum(x) d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)

Note that zeroes are replaced by 0.00001 whem method equals "chi" or "hellinger".

Author

Blas Benito <blasbenito@gmail.com>

Examples