dissrep {TraMineR} | R Documentation |

## Extracting sets of representative objects using a dissimilarity matrix

### Description

The function extracts a set of representative objects that exhibits the key features of the whole data set, the goal being to get easy sounded interpretation of the latter. The user can set either the desired coverage level (the proportion of objects having a representative in their neighborhood) or the desired number of representatives.

### Usage

```
dissrep(diss, criterion = "density", score = NULL, decreasing = TRUE,
coverage = 0.25, nrep = NULL, pradius = 0.10, dmax = NULL,
weights = NULL, trep, tsim)
```

### Arguments

`diss` |
A dissimilarity matrix or a |

`criterion` |
the representativeness criterion for sorting the
candidate list. One of |

`score` |
an optional vector containing the representativeness scores used for sorting the objects in the candidate list. The length of the vector must be equal to the number of rows/columns in the distance matrix, i.e the number of objects. |

`decreasing` |
logical. If a score vector is provided, should
the objects in the candidate list be sorted in ascending order of the score. If |

`coverage` |
controls the size of the representative set by setting
the desired coverage level, i.e the proportion of objects having a
representative in their neighborhood. Neighborhood radius is defined
by |

`nrep` |
number of representatives. If |

`pradius` |
neighborhood
radius as a percentage of the maximum (theoretical)
distance |

`dmax` |
maximum theoretical distance. The |

`weights` |
vector of weights of length equal to the number of rows of the dissimilarity matrix. If |

`trep` |
Deprecated. Use |

`tsim` |
Deprecated. Use |

### Details

The representative set is obtained by an heuristic. Representatives are selected by successively extracting from the sequences sorted by their representativeness score those which are not redundant with already retained representatives. The selection stops when either the desired coverage or the wanted number of representatives is reached. Objects are sorted either by the values provided as `score`

argument, or by specifying one of the following as `criterion`

argument: `"freq"`

(*sequence frequency*), `"density"`

(*neighborhood density*), `"dist"`

(*centrality*).

The *frequency* criterion uses the frequencies as
representativeness score. The frequency of an object in the data is
computed as the number of other objects with whom the dissimilarity
is equal to 0. The more frequent an object the more representative it
is supposed to be. Hence, objects are sorted in decreasing frequency
order. This criterion is equivalent to the neighborhood (see below)
criterion with a neighborhood radius equal to 0.

The *neighborhood density* is the
number—density—of objects in the neighborhood of the object. This requires to set the neighborhood radius `pradius`

. Objects are
sorted in decreasing density order.

The *centrality* criterion is the sum of distances to all other objects. The
smallest the sum, the most representative the object.

Use `criterion="dist"`

(centrality) and `nrep=1`

to get the medoid and `criterion="density"`

and `nrep=1`

to get the densest object pattern.

For more details, see Gabadinho and Ritschard, 2013.

### Value

An object of class `diss.rep`

. This is a vector containing
the indexes of the representative objects with the following additional attributes:

`Scores` |
vector with the representative score of each object given the chosen criterion. |

`Distances` |
matrix with the distance of each object to its nearest representative. |

`Rep.group` |
vector with, for each object, the representative that represents it. |

`idx.rep` |
list with indexes of occurrences of each representative in original data. |

`Statistics` |
a data frame with quality measures for each representative: number of objects assigned to the representative, number of objects in the representative's neighborhood, mean distance to the representative. |

`Quality` |
overall quality measure. |

Print and summary methods are available.

### Author(s)

Alexis Gabadinho and Gilbert Ritschard

### References

Gabadinho A, Ritschard G (2013). "Searching for typical life trajectories applied to child birth histories", In R Lévy, E. Widmer (eds.), *Gendered Life Courses*, pp. 287-312. Vienna: LIT.

Gabadinho A, Ritschard G, Studer M, Müller NS (2011). "Extracting and Rendering Representative Sequences", In A Fred, JLG Dietz, K Liu, J Filipe (eds.), *Knowledge Discovery, Knowledge Engineering and Knowledge Management*, volume 128 of *Communications in Computer and Information Science (CCIS)*, pp. 94-106. Springer-Verlag.

### See Also

### Examples

```
## Defining a sequence object with the data in columns 10 to 25
## (family status from age 15 to 30) in the biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam[, 10:25], labels=biofam.lab)
## Computing the distance matrix
costs <- seqsubm(biofam.seq, method="TRATE")
biofam.om <- seqdist(biofam.seq, method="OM", sm=costs)
## Representative set using the neighborhood density criterion
biofam.rep <- dissrep(biofam.om)
biofam.rep
summary(biofam.rep)
## indexes of first occurrence of second representatives in original data
attr(biofam.rep,"idx.rep")[[2]][1]
```

*TraMineR*version 2.2-10 Index]