Computes distance matrices among the samples of two or more multivariate time-series provided in a single dataframe (generally produced by
prepareSequences), identified by a grouping column (argument
grouping.column). Distances can be computed with the methods "manhattan", "euclidean", "chi", and "hellinger", and are implemented in the function
distance. The function uses the packages
doParallel to compute distances matrices among different sequences in parallel. It is configured to use all processors available minus one.
distanceMatrix( sequences = NULL, grouping.column = NULL, time.column = NULL, exclude.columns = NULL, method = "manhattan", parallel.execution = TRUE )
dataframe with multiple sequences identified by a grouping column. Generally the ouput of
character string, name of the column in
character string, name of the column with time/depth/rank data. The data in this column is not modified.
character string or character vector with column names in
character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.
A list with named slots containing the the distance matrices of every possible combination of sequences according to
Distances are computed as:
d <- sum(abs(x - y))
d <- sqrt(sum((x - y)^2))
xy <- x + y
y. <- y / sum(y)
x. <- x / sum(x)
d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)
Note that zeroes are replaced by 0.00001 whem
method equals "chi" or "hellinger".
#loading data data(sequenceA) data(sequenceB) #preparing datasets AB.sequences <- prepareSequences( sequence.A = sequenceA, sequence.A.name = "A", sequence.B = sequenceB, sequence.B.name = "B", merge.mode = "complete", if.empty.cases = "zero", transformation = "hellinger" ) #computing distance matrix AB.distance.matrix <- distanceMatrix( sequences = AB.sequences, grouping.column = "id", method = "manhattan", parallel.execution = FALSE ) #plot plotMatrix(distance.matrix = AB.distance.matrix)