In recent years, gene order data has been intensively used to study phylogenetic trees since it provides a whole-genome view on evolution [18, 8, 6]. Because some of the simplest formulations of the problem even with 3 genomes are NP-hard , reconstruction algorithms have to rely on heuristics to recover a most parsimonious scenario [13, 2, 11]. Nonetheless, this did not prevent applications using the whole-genome of various vertebrate species [4, 14].
Gene-order phylogenetic reconstruction algorithms are typically evaluated based on three criteria: 1) their ability to recover the correct tree topology, 2) the total number of rearrangements in the scenario recovered [13, 2] and 3) the quality of the ancestral reconstructions [7, 3]. In the current work, we plan to evaluate these reconstructions based on a different criterion: the accuracy of the rearrangements in the recovered scenarios. To our knowledge, this has yet to be systematically analyzed. The idea here is to shift the focus from trees and ancestral reconstructions and study the quality of the inferred scenario themselves. We will only look for highly reliable (i.e. true) events as they are likely to lead to new insights in our understanding of the underlying evolutionary mechanisms. Such an analysis has seldom been performed because multiple optimal rearrangement paths frequently exist even between a pair of genomes . Initially of course, this assessment will be performed on simulated data sets where the accuracy can be assessed.
Although maximum likelihood-based methods are an appealing way to try to achieve this goal, such probabilistic formulations have so far proven to be computationally prohibitive . Moreover, the few developments in this area [12, 9] did not aim to estimate the accuracy of the individual ancestral events either.
G. Tesler and D. Durand (Eds.): RECOMB-CG 2007, LNBI 4751, pp. 149-161, 2007. © Springer-Verlag Berlin Heidelberg 2007
Our main contribution is a new approach called Efficient Method to Recover Ancestral Events (EMRAE) that allows the inference, on a fixed phylogenetic tree, of a partial rearrangement scenario consisting of only reliable events. As a first step, the rearrangement operations that we consider are reversals and transpositions but the method is readily expandable to other types of events. The approach relies on adjacencies shared by a significant fraction of the genomes in a given subtree. The ability to model transpositions is one of the strengths of EMRAE, since transpositions are typically harder to characterize even when only 2 genomes are considered [1, 23].
We compare EMRAE to two standard reconstruction tools: MGR  and GRAPPA  and show that EMRAE achieves comparable sensitivity but significantly higher specificity under both random and fragile models. Then, we apply our approach to two real data sets: the Campanulaceae Chloroplast dataset  and a data set consisting of 4 bacterial genomes in the Burkholderia family . Finally, we will present some potential extensions and future directions.
Was this article helpful?