seqlogp {TraMineR} | R Documentation |

## Logarithm of the probabilities of state sequences

### Description

Logarithm of the probabilities of state sequences. The probability of a sequence is defined as the product of the probabilities of the successive states in the sequence. State probabilities can either be provided or be computed with one of a few basic models.

### Usage

```
seqlogp(seqdata, prob="trate", time.varying=TRUE,
begin="freq", weighted=TRUE, with.missing=FALSE)
```

### Arguments

`seqdata` |
A state sequence object as produced by |

`prob` |
String or numeric array. If a string, either |

`time.varying` |
Logical. If |

`begin` |
String of numeric vector. Distribution used to determine the probability of the first state. If a vector, the probabilites to use. If a string, either |

`weighted` |
Logical. Should we account for the weights when present in |

`with.missing` |
Logical. Should non void missing states be treated as regular values? Default is |

### Details

The sequence likelihood `P(s)`

is defined as the product of the probability with which each of its observed successive state is supposed to occur at its position.
Let `s=s_{1}s_{2} \cdots s_{\ell}`

be a sequence of length `\ell`

. Then

```
P(s)=P(s_{1},1) \cdot P(s_{2},2) \cdots P(s_{\ell},\ell)
```

with `P(s_{t},t)`

the probability to observe state `s_t`

at position `t`

.

There are different ways to determine the state probabilities `P(s_t,t)`

. The method is chosen by means of the `prob`

argument.

With `prop = "freq"`

, the probability `P(s_{t},t)`

is set as the observed relative frequency at position `t`

. In that case, the probability does not depend on the probabilities of transition. By default (`time.varying=TRUE`

), the relative frequencies are computed separately for each position `t`

. With `time.varying=FALSE`

, the relative frequencies are computed over the entire covered period, i.e. the same frequencies are used at each `t`

.

Option `prop = "trate"`

assumes that each `P(s_t,t)`

, `t>1`

is set as the transition probability `p(s_t|s_{t-1})`

. The state distribution used to determine the probability of the first state `s_1`

is set by means of the `begin`

argument (see below). With the default `time.varying=TRUE`

), the transition probabilities are estimated separately at each position, yielding an array of transition matrices. With `time.varying=FALSE`

, the transition probabilities are assumed to be constant over the successive positions and are estimated over the entire sequence duration, i.e. from all observed transitions.

Custom transition probabilities can be provided by passing a matrix or a 3-dimensional array as `prob`

argument.

The distribution used at the first position is set by means of the `begin`

argument. You can either pass the distribution (probabilities of the states in the alphabet including the missing value when `with.missing=TRUE`

), or specify `"freq"`

for the observed distribution at the first position, or `global.freq`

for the overall state distribution.

The likelihood `P(s)`

being generally very small, `seqlogp`

returns `-\log P(s)`

. The latter quantity is minimal when `P(s)`

is equal to `1`

.

### Value

Vector of the negative logarithm `-\log P(s)`

of the sequence probabilities.

### Author(s)

Matthias Studer, Alexis Gabadinho, and Gilbert Ritschard

### Examples

```
## Creating the sequence objects using weigths
data(biofam)
biofam.seq <- seqdef(biofam, 10:25, weights=biofam$wp00tbgs)
## Computing sequence probabilities
biofam.prob <- seqlogp(biofam.seq)
## Comparing the probability of each cohort
cohort <- biofam$birthyr>1940
boxplot(biofam.prob~cohort)
```

*TraMineR*version 2.2-10 Index]