seqMD {TraMineR} | R Documentation |

## Multidomain sequences

### Description

Build multidomain (MD) sequences of combined individual domain states (expanded alphabet), derive multidomain indel and substitution costs from domain costs by means of an additive trick (CAT), and compute OM pairwise distances using CAT costs.

### Usage

```
seqMD(channels,
method=NULL,
norm="none",
indel="auto",
sm=NULL,
with.missing=NULL,
full.matrix=TRUE,
link="sum",
cval=2,
miss.cost=2,
cweight=NULL,
what="MDseq",
ch.sep="+",
fill.with.miss=TRUE
)
seqdistmc(channels, what="diss", ch.sep="@@@@TraMineRSep@@@@", ...)
```

### Arguments

`channels` |
A list of domain state sequence |

`method` |
String. Default: |

`norm` |
String.
Default: |

`indel` |
Double, vector of doubles, or list with an insertion/deletion cost or a vector of state dependent indel costs for each domain. Can also be |

`sm` |
A list with a substitution-cost matrix for each domain
or a list of method names for generating the domain substitution costs
(see |

`with.missing` |
Logical, vector of logical, or |

`full.matrix` |
Logical. If |

`link` |
Character string. One of |

`cval` |
Double. Domain substitution cost for |

`miss.cost` |
Double. Cost to substitute missing values at domain level, see |

`cweight` |
A vector of domain weights. Default is 1 (same weight for each domain). |

`what` |
Character string. What output should be returned? One of |

`ch.sep` |
Character string. Separator used for building state names of the expanded alphabet. |

`fill.with.miss` |
Logical. Should shorter domain sequences be filled with missings to match sequence lengths across domains? Applies only to domains that already have missings. |

`...` |
arguments passed to |

### Details

The `seqMD`

function builds MD sequences by combining the domain states. When `what="cost"`

, it derives multidomain indel and substitution costs from the indel and substitution costs of each domain by means of the cost additive trick (CAT) (Ritschard et al., 2023, Pollock, 2007). When `what="diss"`

, it computes multidomain distances using the CAT multidomain costs. The available metrics (see `method`

argument) are optimal matching (`"OM"`

), Hamming distance (`"HAM"`

), and Dynamic Hamming Distance (`"DHD"`

). If `method="LCS"`

, distances are obtained with OM using CAT costs derived from domain indel and sm costs of respectively 1 and 2 (i.e. inputted `indel`

and `sm`

are ignored). For other edit distances, extract the combined state sequence object (by setting `what="MDseq"`

) and the CAT-multidomain substitution and indel costs (by setting `what="cost"`

). Then use these outcomes as input in a call to `seqdist`

. See `seqdist`

for more information about available distance measures.

Normalization may be useful when dealing with sequences that are not all of the same length. For details on the applied normalization, see `seqdist`

.

Sequences lengths are supposed to match across domains. If `fill.with.miss`

is `TRUE`

and the i-th sequence is shorter in one domain than the longest i-th sequence, it will, when constructing the i-th MD sequence, be filled with missing values to adapt its length to that of the longest sequence. However, this applies only for domain that already have missings, i.e., domains with a corresponding `with.missing`

value set as `TRUE`

.

### Value

When `what="MDseq"`

, the MD sequences of combined states as a `stslist`

sequence object.

When `what="cost"`

, the matrix of CAT-substitution costs with three attributes: `indel`

the CAT-indel cost(s), `alphabet`

the alphabet of the combined state sequences, and `cweight`

the channel weights used.

When `what="diss"`

, a matrix of pairwise distances between MD sequences.

### Author(s)

Gilbert Ritschard and Matthias Studer

### References

Ritschard, G., T.F. Liao, and E. Struffolino (2023). Strategies for
multidomain sequence analysis in social research.
*Sociological Methodology*, 53(2), 288-322. doi:10.1177/00811750231163833.

Pollock, G. (2007) Holistic trajectories: a study of combined employment, housing and family careers by using multiple-sequence analysis. *Journal of the Royal Statistical Society: Series A* **170**, Part 1, 167–183.

### See Also

`seqcost`

, `seqdef`

, `seqdist`

, `seqplotMD`

.

### Examples

```
data(biofam)
## Building one channel per type of event left home, married, and child
cases <- 200
bf <- as.matrix(biofam[1:cases, 10:25])
left <- bf==1 | bf==3 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
children <- bf==4 | bf==5 | bf==6
## Building sequence objects
left.seq <- seqdef(left)
marr.seq <- seqdef(married)
child.seq <- seqdef(children)
channels <- list(LeftHome=left.seq, Marr=marr.seq, Child=child.seq)
## CAT multidomain distances based on channel specific cost methods
MDdist <- seqMD(channels, method="OM",
sm =list("INDELSLOG", "INDELSLOG", "TRATE"), what="diss")
## Providing channel specific substitution costs
smatrix <- list()
smatrix[[1]] <- seqsubm(left.seq, method="TRATE")
smatrix[[2]] <- seqsubm(marr.seq, method="CONSTANT")
smatrix[[3]] <- seqsubm(child.seq, method="CONSTANT")
## Retrieving the MD sequences
MDseq <- seqMD(channels)
alphabet(MDseq)
## Retrieving the CAT multidomain substitution costs
## Using double weight for domain "child"
CATcost <- seqMD(channels,
sm=smatrix, cweight=c(1,1,2), what="cost")
## OMspell distances between MD sequences
MDdist2 <- seqdist(MDseq, method="OMspell",
sm = CATcost, indel=attr(CATcost,"indel"))
```

*TraMineR*version 2.2-10 Index]