If the sequences are not aligned (paired.samples = FALSE), the function executes these steps.

  • Computes the autosum of the sequences with autoSum.

  • Computes the distance matrix with distanceMatrix.

  • Uses the distance matrix to compute the least cost matrix with leastCostMatrix.

  • Extracts the cost of the least cost path with leastCost.

  • Computes the dissimilarity measure psi with the function psi.

  • Delivers an output of type "list" (default), "data.frame" or "matrix", depending on the user input, through formatPsi.

If the sequences are aligned (paired.samples = TRUE), these steps are executed:

  • Computes the autosum of the sequences with autoSum.

  • Sums the distances between paired samples with distancePairedSamples.

  • Computes the dissimilarity measure psi with the function psi.

  • Delivers an output of type "list" (default), "data.frame" or "matrix", depending on the user input, through formatPsi.

workflowPsi(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  method = "manhattan",
  diagonal = FALSE,
  format = "dataframe",
  paired.samples = FALSE,
  same.time = FALSE,
  ignore.blocks = FALSE,
  parallel.execution = TRUE
  )

Arguments

sequences

dataframe with multiple sequences identified by a grouping column generated by prepareSequences.

grouping.column

character string, name of the column in sequences to be used to identify separates sequences within the file.

time.column

character string, name of the column with time/depth/rank data.

exclude.columns

character string or character vector with column names in sequences to be excluded from the analysis.

method

character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.

diagonal

boolean, if TRUE, diagonals are included in the computation of the least cost path. Defaults to FALSE, as the original algorithm did not include diagonals in the computation of the least cost path. If paired.samples is TRUE, then diagonal is irrelevant.

format

string, type of output. One of: "data.frame", "matrix". If NULL or empty, a list is returned.

paired.samples

boolean, if TRUE, the sequences are assumed to be aligned, and distances are computed for paired-samples only (no distance matrix required). Default value is FALSE.

same.time

boolean. If TRUE, samples in the sequences to compare will be tested to check if they have the same time/age/depth according to time.column. This argument is only useful when the user needs to compare two sequences taken at different sites but same time frames.

ignore.blocks

boolean. If TRUE, the function leastCostPathNoBlocks analyzes the least-cost path of the best solution, and removes blocks (straight-orthogonal sections of the least-cost path), which happen in highly dissimilar sections of the sequences, and inflate output psi values.

parallel.execution

boolean, if TRUE (default), execution is parallelized, and serialized if FALSE.

Value

A list, matrix, or dataframe, with sequence names and psi values.

Examples

data("sequencesMIS") #prepare sequences MIS.sequences <- prepareSequences( sequences = sequencesMIS, grouping.column = "MIS", if.empty.cases = "zero", transformation = "hellinger" ) #execute workflow to compute psi MIS.psi <- workflowPsi( sequences = MIS.sequences, grouping.column = "MIS", time.column = NULL, exclude.columns = NULL, method = "manhattan", diagonal = FALSE, parallel.execution = FALSE ) MIS.psi
#> A B psi #> 1 MIS-1 MIS-2 2.6492377 #> 2 MIS-1 MIS-3 2.3280158 #> 3 MIS-1 MIS-4 2.6104038 #> 4 MIS-1 MIS-5 2.0253240 #> 5 MIS-1 MIS-6 1.9691034 #> 6 MIS-1 MIS-7 1.6285033 #> 7 MIS-1 MIS-8 1.9617028 #> 8 MIS-1 MIS-9 1.8356375 #> 9 MIS-1 MIS-10 2.6031041 #> 10 MIS-1 MIS-11 1.7753995 #> 11 MIS-1 MIS-12 2.0907673 #> 12 MIS-2 MIS-3 0.5613996 #> 13 MIS-2 MIS-4 0.4893352 #> 14 MIS-2 MIS-5 2.1466342 #> 15 MIS-2 MIS-6 0.7398286 #> 16 MIS-2 MIS-7 1.8617663 #> 17 MIS-2 MIS-8 1.1775915 #> 18 MIS-2 MIS-9 1.4053206 #> 19 MIS-2 MIS-10 0.7293816 #> 20 MIS-2 MIS-11 1.4124383 #> 21 MIS-2 MIS-12 0.7288371 #> 22 MIS-3 MIS-4 0.6257469 #> 23 MIS-3 MIS-5 1.0617244 #> 24 MIS-3 MIS-6 0.4998365 #> 25 MIS-3 MIS-7 1.0310238 #> 26 MIS-3 MIS-8 0.6901092 #> 27 MIS-3 MIS-9 0.7556862 #> 28 MIS-3 MIS-10 0.6800549 #> 29 MIS-3 MIS-11 0.7622108 #> 30 MIS-3 MIS-12 0.5492251 #> 31 MIS-4 MIS-5 2.6972018 #> 32 MIS-4 MIS-6 0.6515961 #> 33 MIS-4 MIS-7 1.9334300 #> 34 MIS-4 MIS-8 1.0620704 #> 35 MIS-4 MIS-9 1.7816734 #> 36 MIS-4 MIS-10 1.4130480 #> 37 MIS-4 MIS-11 1.9332674 #> 38 MIS-4 MIS-12 0.9327703 #> 39 MIS-5 MIS-6 1.4256233 #> 40 MIS-5 MIS-7 0.7710992 #> 41 MIS-5 MIS-8 1.2352032 #> 42 MIS-5 MIS-9 0.8695324 #> 43 MIS-5 MIS-10 0.8504797 #> 44 MIS-5 MIS-11 0.9038762 #> 45 MIS-5 MIS-12 0.7249898 #> 46 MIS-6 MIS-7 1.1532327 #> 47 MIS-6 MIS-8 0.7069980 #> 48 MIS-6 MIS-9 0.9136826 #> 49 MIS-6 MIS-10 0.7631732 #> 50 MIS-6 MIS-11 1.1294040 #> 51 MIS-6 MIS-12 0.6062822 #> 52 MIS-7 MIS-8 0.9481572 #> 53 MIS-7 MIS-9 0.6515808 #> 54 MIS-7 MIS-10 0.7753695 #> 55 MIS-7 MIS-11 0.8991964 #> 56 MIS-7 MIS-12 0.7177108 #> 57 MIS-8 MIS-9 0.7375359 #> 58 MIS-8 MIS-10 0.7666966 #> 59 MIS-8 MIS-11 0.5502679 #> 60 MIS-8 MIS-12 0.5361810 #> 61 MIS-9 MIS-10 0.5283006 #> 62 MIS-9 MIS-11 0.5366605 #> 63 MIS-9 MIS-12 0.5879649 #> 64 MIS-10 MIS-11 0.6595176 #> 65 MIS-10 MIS-12 0.4226933 #> 66 MIS-11 MIS-12 0.5406600