TraMineR: Sequence Analysis

Online documentation

You can see here a short preview of what TraMineR can do for you. Just have a look to get the flavour of TraMineR's main features and of how easy it is to put them at work.
Reference manual [html], [pdf]. See also the TraMineR page on the CRAN .

TraMineR User's Guide

The User's guide of TraMineR (pdf, ~3.6MB) describes the features and usage of TraMineR by means of many examples from the social sciences. It may also serve as an introduction to discrete sequential data analysis.

Gabadinho, A., G. Ritschard, M. Studer and N. S. Müller, Mining sequence data in R with the TraMineR package: A user's guide, University of Geneva, 2009. (http://traminer.unige.ch)

Citing TraMineR

Thank you for citing the article below when presenting analyses realized with the help of TraMineR.

Gabadinho, A., Ritschard, G., Müller, N.S. & Studer, M. (2011), Analyzing and visualizing state sequences in R with TraMineR, Journal of Statistical Software. Vol. 40(4), pp. 1-37.

Help

The help page provides details on forums where you can find help and post questions.

Publications

2023

Ritschard, G. (2023), "Measuring the nature of individual sequences", Sociological Methods and Research. Vol. 52(4), pp. 2016-2049.. (first published online Sep 2021).

[Abstract] [BibTeX] [DOI]

Abstract: This study reviews and compares indicators that can serve to characterize numerically the nature of state sequences. It also introduces several new indicators. Alongside basic measures such as the length, number of visited distinct states, and number of state changes, we shall consider composite measures such as turbulence and the complexity index, and measures that take account of the nature (e.g., positive vs negative or ranking) of the states. The discussion points out the strange behavior of some of the measures---Elzinga's turbulence and the precarity index of Ritschard, Bussi, and O'Reilly in particular---and propositions are made to avoid these flaws. The usage of the indicators is illustrated with two applications using data from the Swiss Household Panel. The first application tests the U-shape hypothesis about the evolution of life satisfaction along the life course and the second one examines the scarring effect of earlier employment sequences.

BibTeX:

@article{Ritschard2023SMaR,
  author = {Ritschard, Gilbert},
  title = {Measuring the nature of individual sequences},
  journal = {Sociological Methods and Research},
  year = {2023},
  volume = {52},
  number = {4},
  pages = {2016-2049},
  note = {(first published online Sep 2021)},
  doi = {10.1177/00491241211036156}
}

Ritschard, G., Liao, T.F. & Struffolino, E. (2023), "Strategies for multidomain sequence analysis in social research", Sociological Methodology. Vol. 53(2), pp. 288-322.

[Abstract] [BibTeX] [DOI]

Abstract: Multidomain/multichannel sequence analysis has become widely used in social science research to uncover the underlying relationships between two or more observed trajectories in parallel. For example, life-course researchers use multidomain sequence analysis to study the parallel unfolding of multiple life-course domains. In this article, the authors conduct a critical review of the approaches most used in multidomain sequence analysis. The parallel unfolding of trajectories in multiple domains is typically analyzed by building a joint multidomain typology and by examining how domain-specific sequence patterns combine with one another within the multidomain groups. The authors identify four strategies to construct the joint multidomain typology: proceeding independently of domain costs and distances between domain sequences, deriving multidomain costs from domain costs, deriving distances between multidomain sequences from within-domain distances, and combining typologies constructed for each domain. The second and third strategies are prevalent in the literature and typically proceed additively. The authors show that these additive procedures assume between-domain independence, and they make explicit the constraints these procedures impose on between-multidomain costs and distances. Regarding the fourth strategy, the authors propose a merging algorithm to avoid scarce combined types. As regards the first strategy, the authors demonstrate, with a real example based on data from the Swiss Household Panel, that using edit distances with data-driven costs at the multidomain level (i.e., independent of domain costs) remains easily manageable with more than 200 different multidomain combined states. In addition, the authors introduce strategies to enhance visualization by types and domains.

BibTeX:

@article{RitschardLiaoStruffolino2023SociologicalMethodology,
  author = {Ritschard, Gilbert and Tim F. Liao and Emanuela Struffolino},
  title = {Strategies for multidomain sequence analysis in social research},
  journal = {Sociological Methodology},
  year = {2023},
  volume = {53},
  number = {2},
  pages = {288-322},
  doi = {10.1177/00811750231163833}
}

2021

Studer, M. (2021), "Validating Sequence Analysis Typologies Using Parametric Boostrap", Sociological Methodology. Vol. 51(2), pp. 290-318.

[Abstract] [BibTeX] [DOI]

Abstract: In this article, the author proposes a methodology for the validation of sequence analysis typologies on
the basis of parametric bootstraps following the framework proposed by Hennig and Lin (2015). The
method works by comparing the cluster quality of an observed typology with the quality obtained by clustering
similar but nonclustered data. The author proposes several models to test the different structuring
aspects of the sequences important in life-course research, namely, sequencing, timing, and duration. This
strategy allows identifying the key structural aspects captured by the observed typology. The usefulness
of the proposed methodology is illustrated through an analysis of professional and coresidence trajectories
in Switzerland. The proposed methodology is available in the WeightedCluster R library.

BibTeX:

@article{Studer2021SM,
  author = {Matthias Studer},
  title = {Validating Sequence Analysis Typologies Using Parametric Boostrap},
  journal = {Sociological Methodology},
  year = {2021},
  volume = {51},
  number = {2},
  pages = {290-318},
  doi = {10.1177/00811750211014232}
}

2018

Ritschard, G., Bussi, M. & O'Reilly, J. (2018), "An Index of Precarity for Measuring Early Employment Insecurity", In Ritschard, G. & Studer, M. (eds) Sequence Analysis and Related Approaches: Innovative Methods and Applications. Series: Life course Research and Social Policies. Volume 10, pp. 279-295. Cham: Springer.

[BibTeX] [DOI]

BibTeX:

@incollection{RitschardBussiOReilly2018sara,
  author = {Gilbert Ritschard and Margherita Bussi and Jacqueline O'Reilly},
  title = {An Index of Precarity for Measuring Early Employment Insecurity},
  booktitle = {Sequence Analysis and Related Approaches: Innovative Methods and Applications},
  editor = {Gilbert Ritschard and Matthias Studer},
  publisher = {Springer},
  year = {2018},
  series = {Life course Research and Social Policies},
  volume = {10},
  pages = {279-295},
  address = {Cham},
  doi = {10.1007/978-3-319-95420-2_16}
}

Ritschard, G. & Studer, M. (2018), "Sequence Analysis: Where Are We, Where Are We Going?", In Ritschard, G. & Studer, M. (eds) Sequence Analysis and Related Approaches: Innovative Methods and Applications. Series: Life course Research and Social Policies. Volume 10, pp. 1-11. Cham: Springer.

[BibTeX] [DOI]

BibTeX:

@incollection{RitschardStuder2018,
  author = {Gilbert Ritschard and Matthias Studer},
  title = {Sequence Analysis: Where Are We, Where Are We Going?},
  booktitle = {Sequence Analysis and Related Approaches: Innovative Methods and Applications},
  editor = {Gilbert Ritschard and Matthias Studer},
  publisher = {Springer},
  year = {2018},
  series = {Life course Research and Social Policies},
  volume = {10},
  pages = {1-11},
  address = {Cham},
  doi = {10.1007/978-3-319-95420-2_1}
}

Ritschard, G. & Studer, M. (eds) (2018), "Sequence Analysis and Related Approaches: Innovative Methods and Applications". Cham:. Vol. 10 Springer.

[BibTeX] [DOI]

BibTeX:

@book{RitschardStuder2018SARA,,
  editor = {Gilbert Ritschard and Matthias Studer},
  title = {Sequence Analysis and Related Approaches: Innovative Methods and Applications},
  publisher = {Springer},
  year = {2018},
  volume = {10},
  doi = {10.1007/978-3-319-95420-2}
}

Rossignon, F., Studer, M., Gauthier, J.-A. & Le Goff, J.-M. (2018), "Sequence History Analysis (SHA): Estimating the Effect of Past Trajectories on an Upcoming Event", In Ritschard, G. & Studer, M. (eds) Sequence Analysis and Related Approaches: Innovative Methods and Applications. Series: Life course Research and Social Policies. Volume 10, pp. 279-295. Cham: Springer.

[BibTeX] [DOI]

BibTeX:

@incollection{RossignonStuderGauthierLeGoff2018,
  author = {Florence Rossignon and Matthias Studer and Jacques-Antoine Gauthier and Jean-Marie Le Goff},
  title = {Sequence History Analysis (SHA): Estimating the Effect of Past Trajectories on an Upcoming Event},
  booktitle = {Sequence Analysis and Related Approaches: Innovative Methods and Applications},
  editor = {Gilbert Ritschard and Matthias Studer},
  publisher = {Springer},
  year = {2018},
  series = {Life course Research and Social Policies},
  volume = {10},
  pages = {279-295},
  address = {Cham},
  doi = {10.1007/978-3-319-95420-2_6}
}

Studer, M. (2018), "Divisive Property-Based and Fuzzy Clustering
for Sequence Analysis", In Ritschard, G. & Studer, M. (eds) Sequence Analysis and Related Approaches: Innovative Methods and Applications. Series: Life course Research and Social Policies. Volume 10, pp. 223-239. Cham: Springer.

[BibTeX] [DOI]

BibTeX:

@incollection{Studer2018,
  author = {Matthias Studer},
  title = {Divisive Property-Based and Fuzzy Clustering
 for Sequence Analysis},
  booktitle = {Sequence Analysis and Related Approaches: Innovative Methods and Applications},
  editor = {Gilbert Ritschard and Matthias Studer},
  publisher = {Springer},
  year = {2018},
  series = {Life course Research and Social Policies},
  volume = {10},
  pages = {223-239},
  address = {Cham},
  doi = {10.1007/978-3-319-95420-2_13}
}

2017

Bürgin, R. & Ritschard, G. (2017), "Coefficient-Wise Tree-Based Varying Coefficient Regression with vcrpart", Journal of Statistical Software. Vol. 80(6), pp. 1-33.

[Abstract] [BibTeX] [DOI]

Abstract: The tree-based TVCM algorithm and its implementation in the R package vcrpart are introduced for generalized linear models. The purpose of TVCM is to learn whether and how the coefficients of a regression model vary by moderating variables. A separate partition is built for each potentially varying coefficient, allowing the user to specify coefficient-specific sets of potential moderators, and allowing the algorithm to select moderators individually by coefficient. In addition to describing the algorithm, the TVCM is evaluated using a benchmark comparison and a simulation study and the R commands are demonstrated by means of empirical applications.

BibTeX:

@article{BuerginRitschard2017JoSS,
  author = {Bürgin, Reto and Gilbert Ritschard},
  title = {Coefficient-Wise Tree-Based Varying Coefficient Regression with vcrpart},
  journal = {Journal of Statistical Software},
  year = {2017},
  volume = {80},
  number = {6},
  pages = {1-33},
  doi = {10.18637/jss.v080.i06}
}

Bürgin, R., Schumacher, R. & Ritschard, G. (2017), "Changes in the Order of Family Life Events in 20th-Century Europe: A Cross-Regional Perspective", Historical Life Course Studies. Vol. 4, pp. 41-58.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: This article analyzes the evolution of the sequencing of family life events in Europe during the second
half of the 20th century using individual data from the European Social Survey and from the Generation
and Gender Program. Considering the four events �leaving the parental home�, �first cohabiting
union�, �first marriage�, and �first parenthood�, we hypothesize a transition from a traditional standard
event order characterized by a high degree of synchronization between the first three events towards a
new standard whose features are a high degree of de-synchronization between first cohabitation and
first marriage and a reversal of the traditional order between first marriage and first parenthood. We
also hypothesize cross-regional differences in the timing and in the shape of the transition from one
standard to another. Applying specifically developed tools to visualize and analyze event sequences,
we show important regional variation in the evolution of the sequencing of family life events. Hardly
any change can be observed in Southern Europe, where the sequencing behavior of family events has
remained highly standardized and rooted in the traditional standard. In Eastern Europe where family
event sequences have become less standardized and where a particular sequence characterized by
the reversal of the traditional order between leaving home and family formation has been observed,
the hypothesized transition is still in its very beginning. In Western Europe the transition is clearly on
its way, but no re-standardization towards a new standard can be observed as for now. As expected,
the transition is most advanced in Northern Europe, where evidence for a certain re-standardization
process in the sequencing of family life events has been found.

BibTeX:

@article{BuerginSchumacherRitschard2017HLCS,
  author = {Bürgin, Reto and Reto Schumacher and Gilbert Ritschard},
  title = {Changes in the Order of Family Life Events in 20th-Century Europe: A Cross-Regional Perspective},
  journal = {Historical Life Course Studies},
  year = {2017},
  volume = {4},
  pages = {41-58},
  doi = {10.51964/hlcs9338}
}

2016

Gabadinho, A. & Ritschard, G. (2016), "Analysing state sequences with probabilistic suffix trees: the PST R package", Journal of Statistical Software. Vol. 72(3), pp. 1-39.

[Abstract] [BibTeX] [DOI]

Abstract: This article presents the PST R package for categorical sequence analysis with probabilistic
suffix trees (PSTs), i.e., structures that store variable-length Markov chains
(VLMCs). VLMCs allow to model high-order dependencies in categorical sequences with
parsimonious models based on simple estimation procedures. The package is specifically
adapted to the field of social sciences, as it allows for VLMC models to be learned from
sets of individual sequences possibly containing missing values; in addition, the package
is extended to account for case weights. This article describes how a VLMC model is
learned from one or more categorical sequences and stored in a PST. The PST can then
be used for sequence prediction, i.e., to assign a probability to whole observed or artificial
sequences. This feature supports data mining applications such as the extraction of typical
patterns and outliers. This article also introduces original visualization tools for both
the model and the outcomes of sequence prediction. Other features such as functions for
pattern mining and artificial sequence generation are described as well. The PST package
also allows for the computation of probabilistic divergence between two models and the
fitting of segmented VLMCs, where sub-models fitted to distinct strata of the learning
sample are stored in a single PST.

BibTeX:

@article{GabadinhoRitschard2016JoSS,
  author = {Gabadinho, Alexis and Gilbert Ritschard},
  title = {Analysing state sequences with probabilistic suffix trees: the PST R package},
  journal = {Journal of Statistical Software},
  year = {2016},
  volume = {72},
  number = {3},
  pages = {1-39},
  doi = {10.18637/jss.v072.i03}
}

Ritschard, G. & Studer, M. (eds) (2016), "Proceedings of the International Conference on Sequence Analysis and Related Methods (LaCOSA II), Lausanne, June 8-10, 2016". Lausanne: NCCR LIVES.

[BibTeX] [Preprint (pdf)]

BibTeX:

@book{RitschardStuder2016LaCOSA,,
  editor = {Ritschard, G. and M. Studer},
  title = {Proceedings of the International Conference on Sequence Analysis and Related Methods (LaCOSA II), Lausanne, June 8-10, 2016},
  publisher = {NCCR LIVES},
  year = {2016}
}

Studer, M. (2016), "Position wise group-typical states"

[BibTeX] [Preprint (pdf)]

BibTeX:

@unpublished{Studer,
  author = {Studer, Matthias},
  title = {Position wise group-typical states},
  year = {2016}
}

Studer, M. & Ritschard, G. (2016), "What Matters in Differences between Life Trajectories: A Comparative Review of Sequence Dissimilarity Measures", Journal of the Royal Statistical Society, Series A. Vol. 179(2), pp. 481-511.

[Abstract] [BibTeX] [DOI]

Abstract: This is a comparative study of the multiple ways of measuring dissimilarities
between state sequences. The originality of the study is the focus put on the differences
between sequences that are sociologically important when studying life courses
such as family life trajectories or professional careers. These differences essentially
concern the sequencing (the order in which successive states appear), the timing,
and the duration of the spells in successive states. The study examines the sensitivity
of the measures to these three aspects analytically and empirically by means
of simulations. Even if some distance measures underperform, the study shows that
there is no universally optimal distance index, and that the choice of a measure depends
on which aspect we want to focus on. From the review and simulation results,
the article derives guidelines to help the end user to chose the right dissimilarity
measure for her/his research objectives. This study also introduces novel ways of
measuring dissimilarities that overcome some flaws in existing measures.

BibTeX:

@article{StuderRitschard2016JotRSSSA,
  author = {Studer, Matthias and Gilbert Ritschard},
  title = {What Matters in Differences between Life Trajectories: A Comparative Review of Sequence Dissimilarity Measures},
  journal = {Journal of the Royal Statistical Society, Series A},
  year = {2016},
  volume = {179},
  number = {2},
  pages = {481-511},
  doi = {10.1111/rssa.12125}
}

2015

Bürgin, R. & Ritschard, G. (2015), "Tree-based varying coefficient regression for longitudinal ordinal responses", Computational Statistics & Data Analysis. Vol. 86, pp. 65-80.

[Abstract] [BibTeX] [DOI]

Abstract: A tree-based algorithm for longitudinal regression analysis that aims to learn whether and how the effects of predictor variables depend on moderating variables is presented. The algorithm is based on multivariate generalized linear mixed models and it builds piecewise constant coefficient functions. Moreover, it is scalable for many moderators of possibly mixed scales, integrates interactions between moderators and can handle nonlinearities. Although the scope of the algorithm is quite general, the focus is on its usage in an ordinal longitudinal regression setting. The potential of the algorithm is illustrated by using data derived from the British Household Panel Study, to show how the effect of unemployment on self-reported happiness varies across individual life circumstances.

BibTeX:

@article{BuerginRitschard2015CSDA,
  author = {Bürgin, Reto and Gilbert Ritschard},
  title = {Tree-based varying coefficient regression for longitudinal ordinal responses},
  journal = {Computational Statistics & Data Analysis},
  year = {2015},
  volume = {86},
  pages = {65-80},
  doi = {10.1016/j.csda.2015.01.003}
}

Elzinga, C.H. & Studer, M. (2015), "Spell Sequences, State Proximities and Distance Metrics", Sociological Methods and Research. Vol. 44(1), pp. 3-47.

[Abstract] [BibTeX] [DOI]

Abstract: Because optimal matching (OM) distance is not very sensitive to differences in the order of states, we introduce a subsequence-based distance measure that can be adapted to subsequence length, to subsequence duration, and to soft-matching of states. Using a simulation technique developed by Studer, we investigate the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states. The results show that the behavior of the metric is as intended. Furthermore, we use family formation data from the Swiss Household Panel to compare a few variants of the new metric to OM. The new metrics have been implemented in the freely available TraMineR-package.

BibTeX:

@article{ElzingaStuder2014SMaR,
  author = {Cees H. Elzinga and Matthias Studer},
  title = {Spell Sequences, State Proximities and Distance Metrics},
  journal = {Sociological Methods and Research},
  year = {2015},
  volume = {44},
  number = {1},
  pages = {3-47},
  doi = {10.1177/0049124114540707}
}

2014

Bürgin, R. & Ritschard, G. (2014), "A decorated parallel coordinate plot for categorical longitudinal data", The American Statistician. Vol. 68(2), pp. 98-103.

[Abstract] [BibTeX] [DOI]

Abstract: This article proposes a decorated parallel coordinate plot for longitudinal categorical data, featuring a jitter mechanism revealing the diversity of observed longitudinal patterns and allowing the tracking of each individual pattern, variable point and line widths reflecting weighted pattern frequencies, the rendering of simultaneous events, and different filter options for highlighting typical patterns. The proposed visual display has been developed for describing and exploring the temporal ordering of events, but it can be equally applied to other types of longitudinal categorical data. Alongside the description of the principle of the plot, we demonstrate the scope of the plot with two real applications.

BibTeX:

@article{BuerginRitschard2014TAS,
  author = {Bürgin, Reto and Ritschard, Gilbert},
  title = {A decorated parallel coordinate plot for categorical longitudinal data},
  journal = {The American Statistician},
  year = {2014},
  volume = {68},
  number = {2},
  pages = {98-103},
  doi = {10.1080/00031305.2014.887591}
}

Studer, M. & Ritschard, G. (2014), "A Comparative Review of Sequence Dissimilarity Measures". LIVES Working Papers, 33. NCCR LIVES, Switzerland, 2014.

[BibTeX] [DOI]

BibTeX:

@techreport{StuderRitschard2014LIVES,
  author = {Matthias Studer and Gilbert Ritschard},
  title = {A Comparative Review of Sequence Dissimilarity Measures},
  year = {2014},
  number = {33},
  type = {LIVES Working Papers},
  institution = {NCCR LIVES},
  address = {Switzerland},
  doi = {10.12682/lives.2296-1658.2014.33}
}

2013

Bürgin, R. & Ritschard, G. (2013), "Rendering the order of life events". LIVES Working Papers, 29. NCCR LIVES, Switzerland, 2013.

[BibTeX] [DOI]

BibTeX:

@techreport{BuerginRitschard2013,
  author = {Reto Bürgin and Gilbert Ritschard},
  title = {Rendering the order of life events},
  year = {2013},
  number = {29},
  type = {LIVES Working Papers},
  institution = {NCCR LIVES},
  address = {Switzerland},
  doi = {10.12682/lives.2296-1658.2013.29}
}

Gabadinho, A. & Ritschard, G. (2013), "Searching for typical life trajectories applied to childbirth histories", In Levy, R. & Widmer, E. (eds) Gendered life courses - Between individualization and standardization. A European approach applied to Switzerland, pp. 287-312. Vienna: LIT.

[BibTeX] [Preprint (pdf)]

BibTeX:

@incollection{GabadinhoRitschard2013typical,
  author = {Gabadinho, Alexis and Gilbert Ritschard},
  title = {Searching for typical life trajectories applied to childbirth histories},
  booktitle = {Gendered life courses - Between individualization and standardization. A European approach applied to Switzerland},
  editor = {Levy, René and Eric Widmer},
  publisher = {LIT},
  year = {2013},
  pages = {287-312},
  address = {Vienna}
}

Ritschard, G., Bürgin, R. & Studer, M. (2013), "Exploratory Mining of Life Event Histories", In McArdle, J.J. & Ritschard, G. (eds) Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences. Series: Quantitative Methodology, pp. 221-253. New York: Routledge.

[BibTeX] [Preprint (pdf)]

BibTeX:

@incollection{RitschardBurginStuder2013CIiEDMitBS,
  author = {Ritschard, Gilbert and Reto Bürgin and Matthias Studer},
  title = {Exploratory Mining of Life Event Histories},
  booktitle = {Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences},
  editor = {McArdle, J. J. and G. Ritschard},
  publisher = {Routledge},
  year = {2013},
  series = {Quantitative Methodology},
  pages = {221-253},
  address = {New York}
}

2012

Bürgin, R. & Ritschard, G. (2012), "Categorical parallel coordinate plot", In LaCOSA Lausanne Conference On Sequence Analysis, University of Lausanne, June 6th-8th 2012. Lausanne. Poster.

[BibTeX] [Preprint (pdf)]

BibTeX:

@conference{BuerginRitschard2012,
  author = {Bürgin, Reto and Gilbert Ritschard},
  title = {Categorical parallel coordinate plot},
  booktitle = {LaCOSA Lausanne Conference On Sequence Analysis, University of Lausanne, June 6th-8th 2012},
  year = {2012},
  note = {Poster}
}

Bürgin, R., Ritschard, G. & Rousseaux, E. (2012), "Visualisation de séquences d'événements", In Extraction et gestion des connaissances (EGC 2012), Revue des Nouvelles Technologies de l'Information (RNTI). Vol. E-23, pp. 559-560.

[BibTeX]

BibTeX:

@article{BurginRitschardRousseaux2012EGCpost,
  author = {Bürgin, Reto and Gilbert Ritschard and Emmanuel Rousseaux},
  title = {Visualisation de séquences d'événements},
  booktitle = {Extraction et gestion des connaissances (EGC 2012)},
  journal = {Revue des Nouvelles Technologies de l'Information (RNTI)},
  year = {2012},
  volume = {E-23},
  pages = {559-560}
}

Bürgin, R., Ritschard, G. & Rousseaux, E. (2012), "Exploration graphique de données séquentielles", In Atelier Fouille Visuelle de Données : méthologie et évaluation, EGC 2012, Bordeaux, pp. 39-50. Association EGC.

[Abstract] [BibTeX] [URL] [Preprint (pdf)]

Abstract: The article introduces an original graphical display for categorical longitudinal data. The visualisation, inspired from the multiple time-series plot, particularly suits to descriptive and exploratory analyses of individual trajectories defined as event sequences. The article includes a description of the visualisation method and of its founding principles, application examples, and a discussion of the properties of the resulting plots. In addition, we explain fine-tuning specifications for optimally rendering given data.

BibTeX:

@incollection{BurginRitschardRousseaux2012FVdD,
  author = {Bürgin, Reto and Gilbert Ritschard and Emmanuel Rousseaux},
  title = {Exploration graphique de données séquentielles},
  booktitle = {Atelier Fouille Visuelle de Données : méthologie et évaluation, EGC 2012, Bordeaux},
  publisher = {Association EGC},
  year = {2012},
  pages = {39-50},
  url = {http://www.egc.asso.fr/}
}

Studer, M. (2012), "Étude des inégalités de genre en début de carrière académique à l'aide de méthodes innovatrices d'analyse de données séquentielles". Vol. SES-777 Université de Genève, Faculté des sciences économiques et sociales.

[BibTeX] [DOI]

BibTeX:

@book{Studer2012,
  author = {Studer, Matthias},
  title = {Étude des inégalités de genre en début de carrière académique à l'aide de méthodes innovatrices d'analyse de données séquentielles},
  publisher = {Université de Genève, Faculté des sciences économiques et sociales},
  year = {2012},
  volume = {SES-777},
  doi = {10.13097/archive-ouverte/unige:22054}
}

2011

Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2011), "Extracting and Rendering Representative Sequences", In Fred, A., Dietz, J.L.G., Liu, K. & Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. Series: Communications in Computer and Information Science (CCIS). Volume 128, pp. 94-106. Springer-Verlag.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: This paper is concerned with the summarization of a set of categorical sequences. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighbourhood. The proposed heuristic for extracting the representative subset requires as main arguments a pairwise distance matrix, a representativeness criterion and a distance threshold under which two sequences are considered as redundant or, identically, in the neighbourhood of each other. It first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in our TraMineR R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains.

BibTeX:

@incollection{Gabadinho_et_al2011CCIS,
  author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Nicolas S. Müller},
  title = {Extracting and Rendering Representative Sequences},
  booktitle = {Knowledge Discovery, Knowledge Engineering and Knowledge Management},
  editor = {Fred, Ana and Jan L. G. Dietz and Kecheng Liu and Joaquim Filipe},
  publisher = {Springer-Verlag},
  year = {2011},
  series = {Communications in Computer and Information Science (CCIS)},
  volume = {128},
  pages = {94-106},
  doi = {10.1007/978-3-642-19032-2}
}

Gabadinho, A., Ritschard, G., Müller, N.S. & Studer, M. (2011), "Analyzing and visualizing state sequences in R with TraMineR", Journal of Statistical Software. Vol. 40(4), pp. 1-37.

[Abstract] [BibTeX] [DOI]

Abstract: This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state sequence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineR�s outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.

BibTeX:

@article{GabadinhoRitschardMullerStuder2011JSS,
  author = {Gabadinho, Alexis and Gilbert Ritschard and Nicolas S. Müller and Matthias Studer},
  title = {Analyzing and visualizing state sequences in R with TraMineR},
  journal = {Journal of Statistical Software},
  year = {2011},
  volume = {40},
  number = {4},
  pages = {1--37},
  doi = {10.18637/jss.v040.i04}
}

Müller, N.S. (2011), "Inégalités sociales et effets cumulés au cours de la vie: concepts et méthodes". Vol. SES-764 Université de Genève, Faculté des sciences économiques et sociales.

[BibTeX] [DOI]

BibTeX:

@book{Muller2011,
  author = {Müller, Nicolas S.},
  title = {Inégalités sociales et effets cumulés au cours de la vie: concepts et méthodes},
  publisher = {Université de Genève, Faculté des sciences économiques et sociales},
  year = {2011},
  volume = {SES-764},
  doi = {10.13097/archive-ouverte/unige:17746}
}

Studer, M., Ritschard, G., Gabadinho, A. & Müller, N.S. (2011), "Discrepancy Analysis of State Sequences", Sociological Methods and Research. Vol. 40(3), pp. 471-510.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)] [Supplement]

Abstract: In this article, the authors define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the principles of analysis of variance, this approach looks at how the covariates explain the discrepancy of the sequences. The authors use the pairwise dissimilarities between sequences to determine the discrepancy, which makes it possible to develop a series of statistical significance-based analysis tools. They introduce generalized simple and multifactor discrepancy-based methods to test for differences between groups, a pseudo-R 2 for measuring the strength of sequence-covariate associations, a generalized Levene statistic for testing differences in the within-group discrepancies, as well as tools and plots for studying the evolution of the differences along the time frame and a regression tree method for discovering the most significant discriminant covariates and their interactions. In addition, the authors extend all methods to account for case weights. The scope of the proposed methodological framework is illustrated using a real-world sequence data set.

BibTeX:

@article{StuderRitschardGabadinhoMuller2011SMR,
  author = {Studer, Matthias and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. Müller},
  title = {Discrepancy Analysis of State Sequences},
  journal = {Sociological Methods and Research},
  year = {2011},
  volume = {40},
  number = {3},
  pages = {471-510},
  doi = {10.1177/0049124111415372}
}

2010

Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2010), "Indice de complexité pour le tri et la comparaison de séquences catégorielles", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. Vol. E-19, pp. 61-66.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: This paper introduces a complexity index for categorical state sequences. Though, the index is more specifically intended for measuring the complexity of sequences describing biographical trajectories in social sciences, it applies to all kind of ordered lists of states. The measure accounts for two distinct aspects of complexity: the complexity of the sequencing of the states captured by the number of transitions and the diversity of states in the sequence measured with Shannon's entropy.

BibTeX:

@article{GabadinhoRitschardStuderMuller2010EGC,
  author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Müller, Nicolas S.},
  title = {Indice de complexité pour le tri et la comparaison de séquences catégorielles},
  booktitle = {Extraction et gestion des connaissances (EGC 2010)},
  journal = {Revue des nouvelles technologies de l'information RNTI},
  year = {2010},
  volume = {E-19},
  pages = {61-66}
}

Müller, N.S., Studer, M., Gabadinho, A. & Ritschard, G. (2010), "Analyse de séquences d'événements avec TraMineR", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. Vol. E-19, pp. 639-640.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: Short description of TraMineR's abilities in event sequence mining that were demonstrated at EGC 2010.

BibTeX:

@article{MullerStuderGabdinhoRitschard2010EGCDemo,
  author = {Müller, Nicolas S. and Matthias Studer and Alexis Gabadinho and Gilbert Ritschard},
  title = {Analyse de séquences d'événements avec TraMineR},
  booktitle = {Extraction et gestion des connaissances (EGC 2010)},
  journal = {Revue des nouvelles technologies de l'information RNTI},
  year = {2010},
  volume = {E-19},
  pages = {639-640}
}

Müller, N.S., Studer, M., Ritschard, G. & Gabadinho, A. (2010), "Extraction de règles d'association séquentielle à l'aide de modèles semi-paramétriques à risques proportionnels", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. Vol. E-19, pp. 25-36.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: Association rules mining is a thriving research field in data mining. These methods can also be applied to sequential data. Two problems arise when one wants to apply association rules mining to sequential data. First, the main criterium used to extract sequential patterns is their frequency. However, two events might be strongly associated even if they do not happen frequently. Second, association rules measures do not take into account the temporal aspect of sequential data, like the importance of the duration between two events or the problem of censured obsevations. In this article, we propose a method to extract significant associations between events using duration models. Association rules are extracted from each sequential pattern observed in a set of sequences. Then, the influence on the risk that the �conclusion� event occurs after the �premise� event(s) is estimated using a proportional hazard semi-parametric duration model. This paper presents the method and a comparison with some classical association measures.

BibTeX:

@article{MullerStuderRitschardGabadinho2010EGC,
  author = {Müller, Nicolas S. and Matthias Studer and Gilbert Ritschard and Alexis Gabadinho},
  title = {Extraction de règles d'association séquentielle à l'aide de modèles semi-paramétriques à risques proportionnels},
  booktitle = {Extraction et gestion des connaissances (EGC 2010)},
  journal = {Revue des nouvelles technologies de l'information RNTI},
  year = {2010},
  volume = {E-19},
  pages = {25-36}
}

Studer, M., Müller, N.S., Ritschard, G. & Gabadinho, A. (2010), "Classer, discriminer et visualiser des séquences d'événements", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. Vol. E-19, pp. 37-48.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: This article presents a set of tools to analyze event sequences in the social sciences and visualize the results. We begin by formalizing the notion of event sequence before defining a measure of dissimilarity between these sequences to cluster them and test the links between these sequences and other variables of interest. Initially defined by Moen (2000), this measure is based on the notion of edit distance between sequences and identifies the differences in sequencing and timing of events. We propose an extension of it in order to take into account the simultaneity of events and a normalization method that guarantees the respect of the triangle inequality. In a second step, we present a set of tools to interpret the results. We thus propose two methods of viewing a set of sequences and we introduce the concept of discriminant subsequence that identifies differences in sequencing that are the most significant between groups. All the tools presented are available in the TraMineR R library.

BibTeX:

@article{StuderMullerRitschardGabadinho2010EGC,
  author = {Studer, Matthias and Nicolas S. Müller and Gilbert Ritschard and Alexis Gabadinho},
  title = {Classer, discriminer et visualiser des séquences d'événements},
  booktitle = {Extraction et gestion des connaissances (EGC 2010)},
  journal = {Revue des nouvelles technologies de l'information RNTI},
  year = {2010},
  volume = {E-19},
  pages = {37-48}
}

Studer, M., Ritschard, G., Gabadinho, A. & Müller, N.S. (2010), "Discrepancy analysis of complex objects using dissimilarities", In Guillet, F., Ritschard, G., Zighed, D.A. & Briand, H. (eds) Advances in Knowledge Discovery and Management. Series: Studies in Computational Intelligence. Volume 292, pp. 3-19. Berlin: Springer.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: In this article we consider objects for which we have a matrix of dissimilarities and we are interested in their links with covariates. We focus on state sequences for which pairwise dissimilarities are given for instance by edit distances. The methods discussed apply however to any kind of objects and measures of dissimilarities. We start with a generalization of the analysis of variance (ANOVA) to assess the link of complex objects (e.g. sequences) with a given categorical variable. The trick is to show that discrepancy among objects can be derived from the sole pairwise dissimilarities, which permits then to identify factors that most reduce this discrepancy. We present a general statistical test and introduce an original way of rendering the results for state sequences. We then generalize the method to the case with more than one factor and discuss its advantages and limitations especially regarding interpretation. Finally, we introduce a new tree method for analyzing discrepancy of complex objects that exploits the former test as splitting criterion. We demonstrate the scope of the methods presented through a study of the factors that most discriminate Swiss occupational trajectories. All methods presented are freely accessible in our TraMineR package for the R statistical environment.

BibTeX:

@incollection{StuderRitschardGabadinhoMuller2009akdm,
  author = {Studer, Matthias and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. Müller},
  title = {Discrepancy analysis of complex objects using dissimilarities},
  booktitle = {Advances in Knowledge Discovery and Management},
  editor = {Fabrice Guillet and Gilbert Ritschard and Djamel A. Zighed and Henri Briand},
  publisher = {Springer},
  year = {2010},
  series = {Studies in Computational Intelligence},
  volume = {292},
  pages = {3-19},
  address = {Berlin},
  doi = {10.1007/978-3-642-00580-0_1}
}

2009

Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2009), "Mining Sequence Data in R with the TraMineR package: A User's Guide". Department of Econometrics and Laboratory of Demography, University of Geneva, Geneva, 2009.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: Full User Guide for TraMineR v. 1.1

BibTeX:

@techreport{Gabadinho_et_al_2009TraMineR-UGuide,
  author = {Gabadinho, Alexis and Ritschard, Gilbert and Studer, Matthias and Müller, Nicolas S.},
  title = {Mining Sequence Data in R with the TraMineR package: A User's Guide},
  year = {2009},
  institution = {Department of Econometrics and Laboratory of Demography, University of Geneva},
  address = {Geneva},
  note = {(TraMineR is on CRAN the Comprehensive R Archive Network)}
}

Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2009), "Summarizing Sets of Categorical Sequences", In International Conference on Knowledge Discovery and Information Retrieval, Madeira, 6-8 October, 2009, pp. 62-69. INSTICC. (Received the Best Paper Award).

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: This paper is concerned with the summarization of a set of categorical sequence data. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighborhood. The goal is to yield a representative set that exhibits the key features of the whole sequence data set and permits easy sounded interpretation. We propose an heuristic for determining the representative set that first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in TraMineR our R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains.

BibTeX:

@incollection{Gabadinho2009,
  author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Nicolas S. Müller},
  title = {Summarizing Sets of Categorical Sequences},
  booktitle = {International Conference on Knowledge Discovery and Information Retrieval, Madeira, 6-8 October, 2009},
  publisher = {INSTICC},
  year = {2009},
  pages = {62-69},
  note = {(Received the Best Paper Award)}
}

Gabadinho, A., Müller, N.S., Ritschard, G. & Studer, M. (2009), "TraMineR: une librairie R pour l'analyse de données séquentielles", In Extraction et gestion des connaissances (EGC 2009), Revue des nouvelles technologies de l'information RNTI. Vol. E-15, pp. 483.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: Short presentation of TraMineR

BibTeX:

@article{GabadinhoMullerRitschardStuder2009EGC,
  author = {Gabadinho, Alexis and Nicolas S. Müller and Gilbert Ritschard and Studer, Matthias},
  title = {TraMineR: une librairie R pour l'analyse de données séquentielles},
  booktitle = {Extraction et gestion des connaissances (EGC 2009)},
  journal = {Revue des nouvelles technologies de l'information RNTI},
  year = {2009},
  volume = {E-15},
  pages = {483}
}

Ritschard, G., Gabadinho, A., Müller, N.S. & Studer, M. (2009), "Tutoriels: Données séquentielles, (I) concepts et (II) pratique dans R avec TraMineR". Brochure des Dias. Conférence EGC, Strasbourg, 2009.

[Abstract] [BibTeX] [URL] [Preprint (pdf)]

Abstract: L'objectif du cours est d'initier les participants aux concepts et questionnements propres aux données séquentielles catégorielles et aux principes de l'analyse et de la représentation de séquences. Les données séquentielles pouvant prendre des formes très diverses, on peut préciser ici que le cours traite essentiellement de données constituées d'un ensemble de séquences individuelles, les séquences individuelles étant des suites de l éléments choisis dans un alphabet fini de taille k. On considère typiquement des cas où, pour donner un ordre de grandeur, l < 100 et k < 20. Après avoir dressé une ontologie des types de séquences et des possibilités de les formater, on commence par traiter de la représentation agrégée d'ensemble de séquences, puis nous introduisons un ensemble d'indicateurs synthétisant la nature de séquences individuelles et discutons de métriques pour évaluer la similarité de paires de séquences. Ces dernières sont exploitées notamment pour réaliser des classifications non supervisées de séquences ou pour en donner des représentations sous forme de nuage de points à l'aide du Muldimensional Scaling. (MDS). Nous traitons également de l'extraction de sous-séquences fréquentes ainsi que de la recherche de sous-séquences discriminantes. La seconde partie est une initiation à la pratique de l'analyse de séquences dans R avec la librairie TraMineR.

BibTeX:

@techreport{Ritschard2009,
  author = {Ritschard, Gilbert and Alexis Gabadinho and Nicolas S. Müller and Matthias Studer},
  title = {Tutoriels: Données séquentielles, (I) concepts et (II) pratique dans R avec TraMineR},
  year = {2009},
  type = {Brochure des Dias},
  institution = {Conférence EGC},
  address = {Strasbourg},
  url = {http://mephisto.unige.ch/biomining/EGC_tutoriel_donnees_sequentielles.html}
}

Ritschard, G., Gabadinho, A., Studer, M. & Müller, N.S. (2009), "Converting between various sequence representations", In Ras, Z. & Dardzinska, A. (eds) Advances in Data Management. Series: Studies in Computational Intelligence. Volume 223, pp. 155-175. Berlin: Springer.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: This chapter is concerned with the organization of categorical sequence data. We first build a typology of sequences distinguishing for example between chronological sequences and sequences without time content. This permits to identify the kind of information that the data organization should preserve. Focusing then mainly on chronological sequences, we discuss the advantages and limits of different ways of representing time stamped event and state sequence data and present solutions for automatically converting between various formats, e.g., between horizontal and vertical presentations but also from state sequences into event sequences and reciprocally. Special attention is also drawn to the handling of missing values in these conversion processes.

BibTeX:

@incollection{RitschardGabadinhoStuderMuller2009DManag,
  author = {Ritschard, Gilbert and Alexis Gabadinho and Matthias Studer and Nicolas S. Müller},
  title = {Converting between various sequence representations},
  booktitle = {Advances in Data Management},
  editor = {Ras, Zbigniew and Agnieszka Dardzinska},
  publisher = {Springer},
  year = {2009},
  series = {Studies in Computational Intelligence},
  volume = {223},
  pages = {155-175},
  address = {Berlin},
  doi = {10.1007/978-3-642-02190-9_8}
}

Studer, M., Ritschard, G., Gabadinho, A. & Müller, N.S. (2009), "Analyse de dissimilarités par arbre d'induction", In Extraction et gestion des connaissances (EGC 2009), Revue des nouvelles technologies de l'information RNTI. Vol. E-15, pp. 7-18.

[Abstract] [BibTeX] [Preprint (pdf)]

Abstract: In this article we consider objects for which we have a matrix of dissimilarities and we are interested in their links with attributes. We focus on state sequences for which dissimilarities are given for instance by edit distances. The methods discussed apply however to any kind of objects and measures of dissimilarities. We start with a generalization of the analysis of variance (ANOVA) to assess the link of non measurable objects (e.g. sequences) with a given categorical variable. The trick is to show that variability among objects can be derived from the sole dissimilarities, which permits then to identify factors that most reduce this variability. We infer a general statistical test and introduce an original way of rendering the results for state sequences. We then generalize the method to the case with more than one factor and discuss its benefits and limitations especially regarding interpretation. Finally, we introduce a new tree method for general objects that exploits the former test based on dissimilarity measures as splitting criterion. We demonstrate the scope of the various methods presented through a study of the factors that most discriminate occupational trajectories.

BibTeX:

@article{StuderRitschardGabadinhoMuller2009EGC,
  author = {Studer, Matthias and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. Müller},
  title = {Analyse de dissimilarités par arbre d'induction},
  booktitle = {Extraction et gestion des connaissances (EGC 2009)},
  journal = {Revue des nouvelles technologies de l'information RNTI},
  year = {2009},
  volume = {E-15},
  pages = {7-18}
}

Widmer, E. & Ritschard, G. (2009), "The De-Standardization of the Life Course: Are Men and Women Equal?", Advances in Life Course Research. Vol. 14(1-2), pp. 28-39.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: Various studies suggest that rather than being a general trend that concern all individuals and all life domains uniformly, the de-standardization of the life course has taken distinct shapes and has followed distinct paces in various countries and social groups. In that respect, the gender divide may play a key role in de-standardization processes. The paper empirically tests cohort and sex effects on quantified indexes of de-standardization based on data from the Swiss Household Panel. Optimal matching is used in order to uncover whether these trends and their gendering, if any, may be accounted for by the development of new types of trajectories. A strong impact of cohorts on indices of de-standardization was found for both family and occupational trajectories. Gender effects mainly concern occupational trajectories. The results are discussed in light of the master status hypothesis.

BibTeX:

@article{WidmerRitschard2009ALCR,
  author = {Widmer, Eric and Gilbert Ritschard},
  title = {The De-Standardization of the Life Course: Are Men and Women Equal?},
  journal = {Advances in Life Course Research},
  year = {2009},
  volume = {14},
  number = {1-2},
  pages = {28-39},
  doi = {10.1016/j.alcr.2009.04.001}
}

Widmer, E., Ritschard, G. & Müller, N.S. (2009), "Trajectoires professionnelles et familiales en Suisse: quelle pluralisation?", In Oris, M. & others (eds) Transitions dans le parcours de vie et construction des inégalités, pp. 253-272. Lausanne: Presses Polytechniques Universitaires Romandes.

[BibTeX] [Preprint (pdf)]

BibTeX:

@incollection{WidmerRitschardMuller2009PPUR,
  author = {Widmer, Eric and Ritschard, Gilbert and Nicolas S. Müller},
  title = {Trajectoires professionnelles et familiales en Suisse: quelle pluralisation?},
  booktitle = {Transitions dans le parcours de vie et construction des inégalités},
  editor = {Oris, Michel and others},
  publisher = {Presses Polytechniques Universitaires Romandes},
  year = {2009},
  pages = {253-272},
  address = {Lausanne}
}

2008

Müller, N.S., Gabadinho, A., Ritschard, G. & Studer, M. (2008), "Extracting knowledge from life courses: Clustering and visualization", In Song, I.-Y., Eder, J. & Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery, 10th International Conference, DAWAK 2008, Turin, Italy, September 2-5. Series: Lectures Notes in Computer Science. Volume LNCS 5182, pp. 176-185. Berlin Heidelberg: Springer.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: This article presents some of the facilities offered by our TraMiner R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the n most frequent sequences. TraMiner was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should, nevertheless, be of interest for other kind of sequential data such as DNA analysis or web logs.

BibTeX:

@incollection{MullerGabadinhoRitschardStuder2008DaWaK,
  author = {Müller, Nicolas S. and Alexis Gabadinho and Gilbert Ritschard and Matthias Studer},
  title = {Extracting knowledge from life courses: Clustering and visualization},
  booktitle = {Data Warehousing and Knowledge Discovery, 10th International Conference, DAWAK 2008, Turin, Italy, September 2-5},
  editor = {Song, Il-Yeol and Johann Eder and Tho Manh Nguyen},
  publisher = {Springer},
  year = {2008},
  series = {Lectures Notes in Computer Science},
  volume = {LNCS 5182},
  pages = {176-185},
  address = {Berlin Heidelberg},
  doi = {10.1007/978-3-540-85836-217}
}

Ritschard, G., Gabadinho, A., Müller, N.S. & Studer, M. (2008), "Mining event histories: A social science perspective", International Journal of Data Mining, Modelling and Management. Vol. 1(1), pp. 68-90.

[Abstract] [BibTeX] [DOI] [Preprint (pdf)]

Abstract: We explore how recent data mining-based tools developed in domains such as biomedicine or text mining for extracting interesting knowledge from sequence data could be applied to personal life course data. We focus on two types of approaches: survival trees that attempt to partition the data into homogeneous groups regarding their survival characteristics, i.e., the duration until a given event occurs and the mining of typical discriminating episodes. We show how these approaches may fruitfully complement the outcome of more classical event history analyses and single out some specific issues raised by their application to socio-demographic data.

BibTeX:

@article{RitschardGabadinhoMullerStuder2008IJDMMM,
  author = {Ritschard, Gilbert and Alexis Gabadinho and Nicolas S. Müller and Matthias Studer},
  title = {Mining event histories: A social science perspective},
  journal = {International Journal of Data Mining, Modelling and Management},
  year = {2008},
  volume = {1},
  number = {1},
  pages = {68-90},
  doi = {10.1504/IJDMMM.2008.022538}
}

Usefull links

The Sequence Analysis Association (SAA)
R, The R-Project for Statistical Computing. R is the free open-source statistical environment used by TraMineR.
For information about contributed R-packages look at the CRAN.
Journal of Statistical Software publishes, among others, articles about R-packages.
SHP Swiss Household Panel
Brendan Halpin's page on sequence analysis (where you can download the SADI package for stata)

Documentation

Online documentation

TraMineR User's Guide

Citing TraMineR

Help

Publications

Usefull links