distancePairedSamples.Rd
Computes the distance (one of: "manhattan", "euclidean", "chi", or "hellinger") between pairs of aligned samples (same order/depth/age) in two or more multivariate time-series.
distancePairedSamples(
sequences = NULL,
grouping.column = NULL,
time.column = NULL,
exclude.columns = NULL,
same.time = FALSE,
method = "manhattan",
sum.distances = FALSE,
parallel.execution = TRUE
)
dataframe with multiple sequences identified by a grouping column. Generally the ouput of prepareSequences
.
character string, name of the column in sequences
to be used to identify separates sequences within the file. This argument is ignored if sequence.A
and sequence.B
are provided.
character string, name of the column with time/depth/rank data. The data in this column is not modified.
character string or character vector with column names in sequences
, or squence.A
and sequence.B
to be excluded from the analysis.
boolean. If TRUE
, samples in the sequences to compare will be tested to check if they have the same time/age/depth according to time.column
. This argument is only useful when the user needs to compare two sequences taken at different sites but same time frames.
character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.
boolean, if TRUE
(default option), the distances between samples are summed, and the output of the function (now a list with a single number on each slot) can be directly used as input for the argument least.cost
in the function psi
.
boolean, if TRUE
(default), execution is parallelized, and serialized if FALSE
.
A list with named slots (names of the sequences separated by a vertical line, as in "A|B") containing numeric vectors with the distance between paired samples of every possible combination of sequences according to grouping.column
.
Distances are computed as:
manhattan
: d <- sum(abs(x - y))
euclidean
: d <- sqrt(sum((x - y)^2))
chi
:
xy <- x + y
y. <- y / sum(y)
x. <- x / sum(x)
d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger
: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)
Note that zeroes are replaced by 0.00001 whem method
equals "chi" or "hellinger".
# \donttest{
#loading data
data(climate)
#preparing sequences
#notice the argument paired.samples
climate.prepared <- prepareSequences(
sequences = climate,
grouping.column = "sequenceId",
time.column = "time",
paired.samples = TRUE
)
#compute pairwise distances between paired samples
climate.prepared.distances <- distancePairedSamples(
sequences = climate.prepared,
grouping.column = "sequenceId",
time.column = "time",
exclude.columns = NULL,
method = "manhattan",
sum.distances = FALSE,
parallel.execution = FALSE
)
# }