seqlogp {TraMineR} | R Documentation |
Logarithm of the probabilities of state sequences
Description
Logarithm of the probabilities of state sequences. The probability of a sequence is defined as the product of the probabilities of the successive states in the sequence. State probabilities can either be provided or be computed with one of a few basic models.
Usage
seqlogp(seqdata, prob="trate", time.varying=TRUE,
begin="freq", weighted=TRUE, with.missing=FALSE)
Arguments
seqdata |
A state sequence object as produced by |
prob |
String or numeric array. If a string, either |
time.varying |
Logical. If |
begin |
String of numeric vector. Distribution used to determine the probability of the first state. If a vector, the probabilites to use. If a string, either |
weighted |
Logical. Should we account for the weights when present in |
with.missing |
Logical. Should non void missing states be treated as regular values? Default is |
Details
The sequence likelihood P(s)
is defined as the product of the probability with which each of its observed successive state is supposed to occur at its position.
Let s=s_{1}s_{2} \cdots s_{\ell}
be a sequence of length \ell
. Then
P(s)=P(s_{1},1) \cdot P(s_{2},2) \cdots P(s_{\ell},\ell)
with P(s_{t},t)
the probability to observe state s_t
at position t
.
There are different ways to determine the state probabilities P(s_t,t)
. The method is chosen by means of the prob
argument.
With prop = "freq"
, the probability P(s_{t},t)
is set as the observed relative frequency at position t
. In that case, the probability does not depend on the probabilities of transition. By default (time.varying=TRUE
), the relative frequencies are computed separately for each position t
. With time.varying=FALSE
, the relative frequencies are computed over the entire covered period, i.e. the same frequencies are used at each t
.
Option prop = "trate"
assumes that each P(s_t,t)
, t>1
is set as the transition probability p(s_t|s_{t-1})
. The state distribution used to determine the probability of the first state s_1
is set by means of the begin
argument (see below). With the default time.varying=TRUE
), the transition probabilities are estimated separately at each position, yielding an array of transition matrices. With time.varying=FALSE
, the transition probabilities are assumed to be constant over the successive positions and are estimated over the entire sequence duration, i.e. from all observed transitions.
Custom transition probabilities can be provided by passing a matrix or a 3-dimensional array as prob
argument.
The distribution used at the first position is set by means of the begin
argument. You can either pass the distribution (probabilities of the states in the alphabet including the missing value when with.missing=TRUE
), or specify "freq"
for the observed distribution at the first position, or global.freq
for the overall state distribution.
The likelihood P(s)
being generally very small, seqlogp
returns -\log P(s)
. The latter quantity is minimal when P(s)
is equal to 1
.
Value
Vector of the negative logarithm -\log P(s)
of the sequence probabilities.
Author(s)
Matthias Studer, Alexis Gabadinho, and Gilbert Ritschard
Examples
## Creating the sequence objects using weigths
data(biofam)
biofam.seq <- seqdef(biofam, 10:25, weights=biofam$wp00tbgs)
## Computing sequence probabilities
biofam.prob <- seqlogp(biofam.seq)
## Comparing the probability of each cohort
cohort <- biofam$birthyr>1940
boxplot(biofam.prob~cohort)