Generation of the exact distribution and simulation of matched nucleotide sequences on a phylogenetic tree

Faisal Ababneh, Lars S. Jermiin, John Robinson

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Nucleotide sequences are often generated by Monte Carlo simulations to address complex evolutionary or analytic questions but the simulations are rarely described in sufficient detail to allow the research to be replicated. Here we briefly review the Markov processes of substitution in a pair of matching (homologous) nucleotide sequences and then extend it to k matching nucleotide sequences. We describe calculation of the joint distribution of nucleotides of two matching sequences. Based on this distribution, we give a method for simulation of the divergence matrix for n sites using the multinomial distribution. This is then extended to the joint distribution for k nucleotide sequences and the corresponding 4 k divergence array, generalizing Felsenstein (Journal of Molecular Evolution, 17, 368-376, 1981), who considered stationary, homogeneous and reversible processes on trees. We give a second method to generate matched sequences that begins with a random ancestral sequence and applies a continuous Markov process to each nucleotide site as in Rambaut and Grassly (Computer Applications in the Biosciences, 13, 235-238, 1997); further, we relate this to an equivalent approach based on an embedded Markov chain. Finally, we describe an approximate method that was recently implemented in a program developed by Jermiin et al. (Applied Bioinformatics, 2, 159-163, 2003). The three methods presented here cater for different computational and mathematical limitations and are shown in an example to produce results close to those expected on theoretical grounds. All methods are implemented using functions in the S-plus or R languages.

Original languageEnglish
Pages (from-to)291-308
Number of pages18
JournalJournal of Mathematical Modelling and Algorithms
Volume5
Issue number3
DOIs
Publication statusPublished - Sep 2006

Fingerprint

Exact Simulation
Phylogenetic Tree
Exact Distribution
Nucleotides
Markov processes
Joint Distribution
Markov Process
Divergence
Embedded Markov Chain
Molecular Evolution
Computer Applications
Multinomial Distribution
Computer applications
Random Sequence
Bioinformatics
Substitution
Simulation
Substitution reactions
Monte Carlo Simulation
Sufficient

Keywords

  • Markov processes on trees
  • Monte Carlo simulations

ASJC Scopus subject areas

  • Analysis

Cite this

Generation of the exact distribution and simulation of matched nucleotide sequences on a phylogenetic tree. / Ababneh, Faisal; Jermiin, Lars S.; Robinson, John.

In: Journal of Mathematical Modelling and Algorithms, Vol. 5, No. 3, 09.2006, p. 291-308.

Research output: Contribution to journalArticle

@article{1963e97a76814a25adbfc21686900b35,
title = "Generation of the exact distribution and simulation of matched nucleotide sequences on a phylogenetic tree",
abstract = "Nucleotide sequences are often generated by Monte Carlo simulations to address complex evolutionary or analytic questions but the simulations are rarely described in sufficient detail to allow the research to be replicated. Here we briefly review the Markov processes of substitution in a pair of matching (homologous) nucleotide sequences and then extend it to k matching nucleotide sequences. We describe calculation of the joint distribution of nucleotides of two matching sequences. Based on this distribution, we give a method for simulation of the divergence matrix for n sites using the multinomial distribution. This is then extended to the joint distribution for k nucleotide sequences and the corresponding 4 k divergence array, generalizing Felsenstein (Journal of Molecular Evolution, 17, 368-376, 1981), who considered stationary, homogeneous and reversible processes on trees. We give a second method to generate matched sequences that begins with a random ancestral sequence and applies a continuous Markov process to each nucleotide site as in Rambaut and Grassly (Computer Applications in the Biosciences, 13, 235-238, 1997); further, we relate this to an equivalent approach based on an embedded Markov chain. Finally, we describe an approximate method that was recently implemented in a program developed by Jermiin et al. (Applied Bioinformatics, 2, 159-163, 2003). The three methods presented here cater for different computational and mathematical limitations and are shown in an example to produce results close to those expected on theoretical grounds. All methods are implemented using functions in the S-plus or R languages.",
keywords = "Markov processes on trees, Monte Carlo simulations",
author = "Faisal Ababneh and Jermiin, {Lars S.} and John Robinson",
year = "2006",
month = "9",
doi = "10.1007/s10852-005-9017-y",
language = "English",
volume = "5",
pages = "291--308",
journal = "Journal of Mathematical Modelling and Algorithms",
issn = "1570-1166",
publisher = "Springer Netherlands",
number = "3",

}

TY - JOUR

T1 - Generation of the exact distribution and simulation of matched nucleotide sequences on a phylogenetic tree

AU - Ababneh, Faisal

AU - Jermiin, Lars S.

AU - Robinson, John

PY - 2006/9

Y1 - 2006/9

N2 - Nucleotide sequences are often generated by Monte Carlo simulations to address complex evolutionary or analytic questions but the simulations are rarely described in sufficient detail to allow the research to be replicated. Here we briefly review the Markov processes of substitution in a pair of matching (homologous) nucleotide sequences and then extend it to k matching nucleotide sequences. We describe calculation of the joint distribution of nucleotides of two matching sequences. Based on this distribution, we give a method for simulation of the divergence matrix for n sites using the multinomial distribution. This is then extended to the joint distribution for k nucleotide sequences and the corresponding 4 k divergence array, generalizing Felsenstein (Journal of Molecular Evolution, 17, 368-376, 1981), who considered stationary, homogeneous and reversible processes on trees. We give a second method to generate matched sequences that begins with a random ancestral sequence and applies a continuous Markov process to each nucleotide site as in Rambaut and Grassly (Computer Applications in the Biosciences, 13, 235-238, 1997); further, we relate this to an equivalent approach based on an embedded Markov chain. Finally, we describe an approximate method that was recently implemented in a program developed by Jermiin et al. (Applied Bioinformatics, 2, 159-163, 2003). The three methods presented here cater for different computational and mathematical limitations and are shown in an example to produce results close to those expected on theoretical grounds. All methods are implemented using functions in the S-plus or R languages.

AB - Nucleotide sequences are often generated by Monte Carlo simulations to address complex evolutionary or analytic questions but the simulations are rarely described in sufficient detail to allow the research to be replicated. Here we briefly review the Markov processes of substitution in a pair of matching (homologous) nucleotide sequences and then extend it to k matching nucleotide sequences. We describe calculation of the joint distribution of nucleotides of two matching sequences. Based on this distribution, we give a method for simulation of the divergence matrix for n sites using the multinomial distribution. This is then extended to the joint distribution for k nucleotide sequences and the corresponding 4 k divergence array, generalizing Felsenstein (Journal of Molecular Evolution, 17, 368-376, 1981), who considered stationary, homogeneous and reversible processes on trees. We give a second method to generate matched sequences that begins with a random ancestral sequence and applies a continuous Markov process to each nucleotide site as in Rambaut and Grassly (Computer Applications in the Biosciences, 13, 235-238, 1997); further, we relate this to an equivalent approach based on an embedded Markov chain. Finally, we describe an approximate method that was recently implemented in a program developed by Jermiin et al. (Applied Bioinformatics, 2, 159-163, 2003). The three methods presented here cater for different computational and mathematical limitations and are shown in an example to produce results close to those expected on theoretical grounds. All methods are implemented using functions in the S-plus or R languages.

KW - Markov processes on trees

KW - Monte Carlo simulations

UR - http://www.scopus.com/inward/record.url?scp=33646876774&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646876774&partnerID=8YFLogxK

U2 - 10.1007/s10852-005-9017-y

DO - 10.1007/s10852-005-9017-y

M3 - Article

VL - 5

SP - 291

EP - 308

JO - Journal of Mathematical Modelling and Algorithms

JF - Journal of Mathematical Modelling and Algorithms

SN - 1570-1166

IS - 3

ER -