dissmergegroups {TraMineR} | R Documentation |
Merging groups by minimizing loss of partition quality.
Description
Merging groups by minimizing loss of partition quality.
Usage
dissmergegroups(
diss,
group,
weights = NULL,
measure = "ASW",
crit = 0.2,
ref = "max",
min.group = 4,
small = 0.05,
silent = FALSE
)
Arguments
diss |
A dissimilarity matrix or a distance object. |
group |
Group membership. Typically, the outcome of a clustering function. |
weights |
Vector of non-negative case weights. |
measure |
Character. Name of quality index. One of those returned by |
crit |
Real in the range [0,1]. Maximal allowed proportion of quality loss. |
ref |
Character. Reference for proportion |
min.group |
Integer. Minimal number of end groups. |
small |
Real. Percentage of sample size under which groups are considered as small. |
silent |
Logical. Should merge steps be displayed during computation? |
Details
The procedure is greedy. The function iteratively searches for the pair of groups whose merge minimizes quality loss. As long as the smallest group is smaller than small
, it searches among the pairs formed by that group with one of the other groups. Once all groups have sizes larger than small
, the search is done among all possible pairs of groups. There are two stopping criteria: the minimum number of groups (min.group
) and maximum allowed quality deterioration (crit
). The percentage specified with crit
applies either to the quality of the initial partition (ref="initial"
), the quality after the previous iteration (ref="previous"
), or the maximal quality achieved so far (ref="max"
), the latter being the default. The process stops when any of the criteria is reached.
Value
Vector of merged group memberships.
Author(s)
Gilbert Ritschard
References
Ritschard, G., T.F. Liao, and E. Struffolino (2023). Strategies for multidomain sequence analysis in social research. Sociological Methodology, 53(2), 288-322. doi:10.1177/00811750231163833
See Also
Examples
data(biofam)
## Building one channel per type of event (children, married, left home)
cases <- 1:40
bf <- as.matrix(biofam[cases, 10:25])
children <- bf==4 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
left <- bf==1 | bf==3 | bf==5 | bf==6
## Creating sequence objects
child.seq <- seqdef(children, weights = biofam[cases,'wp00tbgs'])
marr.seq <- seqdef(married, weights = biofam[cases,'wp00tbgs'])
left.seq <- seqdef(left, weights = biofam[cases,'wp00tbgs'])
## distances by domain
dchild <- seqdist(child.seq, method="OM", sm="INDELSLOG")
dmarr <- seqdist(marr.seq, method="OM", sm="INDELSLOG")
dleft <- seqdist(left.seq, method="OM", sm="INDELSLOG")
dnames <- c("child","marr","left")
## clustering each domain into 2 groups
child.cl2 <- cutree(hclust(as.dist(dchild)),k=2)
marr.cl2 <- cutree(hclust(as.dist(dmarr)),k=2)
left.cl2 <- cutree(hclust(as.dist(dleft)),k=2)
## Multidomain sequences
MD.seq <- seqMD(list(child.seq,marr.seq,left.seq))
d.expand <- seqdist(MD.seq, method="LCS")
clust.comb <- interaction(child.cl2,marr.cl2,left.cl2)
merged.grp <- dissmergegroups(d.expand, clust.comb,
weights=biofam[cases,'wp00tbgs'])
## weighted size of merged groups
xtabs(biofam[cases,'wp00tbgs'] ~ merged.grp)