T-Coffee Explained

T-Coffee
Developer:	Cédric Notredame, Centro de Regulacio Genomica (CRG) - Barcelona
Latest Release Version:	13.45.0.4846264
Latest Preview Version:	13.45.33.7d7e789
Operating System:	UNIX, Linux, MS-Windows, Mac OS X
Genre:	Bioinformatics tool
Licence:	GPL

T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach.^[1] It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported (FASTA, PIR).

Algorithm

T-Coffee algorithm consist of two main features, the first by utilizing heterogeneous data sources it is able to provide simple and flexible means of generating multiple alignments. T-coffee can compute multiple alignments using a library that was generated using a mixture of local and global pair-wise alignments.^[1]

The second is the "Optimization method", used to find the multiple alignment that best fits the pair-wise alignments in the input library using a progressive strategy that can be compared to the one used in ClustalW. The Optimization method has the advantage of being fast and robust. The information in the library is used to carry out progressive alignments and facilitates the duty of considering the alignments between all the pairs while carrying out every step of the progressive multiple alignments.^[1]

Generating a primary library of alignments

The library incorporates a set of pair-wise alignments between all of the sequences to be aligned, the alignments are not required to be consistent. Inside the library, there can be found information on each of the N(N-1)/2 in where N is the number of sequences. Two alignment sources are used for each pair of sequences, one of them classified as local, and the other as global.^[1]

Global alignments are constructed using ClustalW on the sequences, two at a time, and sed to give one full-length alignment between each pair of sequences. The local alignments are the ten top-scoring non-intersecting local alignments gathered using the Lalign program of the FASTA package.^[1]

Each alignment is represented in the library as a list of pair-wise residue matches, each pair is a constraint; however, some constraints are more relevant than others. the importance of each constraint depends on which are more likely to be correct. While computing the multiple alignments, priority is given to the most reliable residue pairs by utilizing a weighting scheme.^[1]

Combination of the libraries

Efficient combination of local and global alignment information is an important factor of T-Coffee. By using the ClustalW and Lalign primary libraries it can be achieved with a process of addition. Any duplicated pair between both libraries is merged into a single entry with the weight of the total sum of both pairs. Else, a new entry is created for the pair. Pairs with a weight of zero will not be represented.^[1] For each pair of aligned residues in the library, it is possible to assign a weight that belongs to the degree to which those residues align consistently. This is called Library extension.

Comparisons with other alignment software

While the default output is a Clustal-like format, it is sufficiently different from the output of ClustalW/X that many programs supporting Clustal format cannot read it; fortunately ClustalX can import T-Coffee output so the simplest fix for this issue is usually to import T-Coffee's output into ClustalX and then re-export. Another possibility is to request the strict Clustalw output format with the option "-output=clustalw_aln".

An important specificity of T-Coffee is its ability to combine different methods and different data types. In its latest version, T-Coffee can be used to combine protein sequences and structures, RNA sequences and structures. It can also run and combine the output of the most common sequence and structure alignment packages.

T-Coffee comes along with a sophisticated sequence reformatting utility named seq_reformat. An extensive documentation is available online.

Variations

M-Coffee: a special mode of T-Coffee that makes it possible to combine the output of the most common multiple sequence alignment packages (Muscle, ClustalW, Mafft, ProbCons, etc.). The resulting alignments are slightly better than the individual one, but most importantly the program indicates the alignment regions where the various packages agree upon. Regions of high agreement are usually well aligned.^[2]
Expresso and 3D-Coffee: these are special modes of T-Coffee making it possible to combine sequence and structures in an alignment. The structure based alignments can be carried out using the most common structural aligners such as TMalign, Mustang, and sap.^[3] ^[4] ^[5] ^[6]
R-Coffee: a special mode of T-Coffee making it possible to align RNA sequences while using secondary structure information.^[7] ^[8]
PSI-Coffee: aligns distantly related proteins using homology extension (slow and accurate)^[9] ^[10]
TM-Coffee: aligns transmembrane proteins using homology extension^[11]
Pro-Coffee: aligns homologous promoter regions^[12]
Accurate: automatically combine the most accurate modes for DNA, RNA and proteins (experimental).^[13]
Combine: combines two (or more) multiple sequence alignments into a single one.

Evaluation

(Transitive Consistency Score) is an extended version of the T-Coffee scoring scheme.^[14] It uses T-Coffee libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. TCS has been shown to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees against Heads-or-Tails, GUIDANCE, Gblocks, and trimAl.^[15]

External links

Notes and References

Notredame C, Higgins DG, Heringa J . T-Coffee: A novel method for fast and accurate multiple sequence alignment . J Mol Biol . 2000-09-08 . 302 . 1 . 205–217 . 10964570 . 10.1006/jmbi.2000.4042. 10189971 .
Wallace . Iain M. . O'Sullivan . Orla . Higgins . Desmond G. . Notredame . Cedric . 2006 . M-Coffee: combining multiple sequence alignment methods with T-Coffee . Nucleic Acids Research . 34 . 6 . 1692–1699 . 10.1093/nar/gkl091 . 1362-4962 . 1410914 . 16556910.
Armougom . Fabrice . Moretti . Sébastien . Poirot . Olivier . Audic . Stéphane . Dumas . Pierre . Schaeli . Basile . Keduas . Vladimir . Notredame . Cedric . 2006-07-01 . Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee . Nucleic Acids Research . 34 . Web Server issue . W604–608 . 10.1093/nar/gkl092 . 1362-4962 . 1538866 . 16845081.
Zhang . Yang . Skolnick . Jeffrey . 2005 . TM-align: a protein structure alignment algorithm based on the TM-score . Nucleic Acids Research . 33 . 7 . 2302–2309 . 10.1093/nar/gki524 . 1362-4962 . 1084323 . 15849316.
Konagurthu . Arun S. . Whisstock . James C. . Stuckey . Peter J. . Lesk . Arthur M. . 2006-08-15 . MUSTANG: a multiple structural alignment algorithm . Proteins . 64 . 3 . 559–574 . 10.1002/prot.20921 . 1097-0134 . 16736488. 14074658 .
Sun . Zheng . Tian . Weidong . 2012 . SAP--a sequence mapping and analyzing program for long sequence reads alignment and accurate variants discovery . PLOS ONE . 7 . 8 . e42887 . 10.1371/journal.pone.0042887 . 1932-6203 . 3413671 . 22880129 . 2012PLoSO...742887S . free .
Wilm . Andreas . Higgins . Desmond G. . Notredame . Cédric . May 2008 . R-Coffee: a method for multiple alignment of non-coding RNA . Nucleic Acids Research . 36 . 9 . e52 . 10.1093/nar/gkn174 . 1362-4962 . 2396437 . 18420654.
Moretti . Sébastien . Wilm . Andreas . Higgins . Desmond G. . Xenarios . Ioannis . Notredame . Cédric . 2008-07-01 . R-Coffee: a web server for accurately aligning noncoding RNA sequences . Nucleic Acids Research . 36 . Web Server issue . W10–13 . 10.1093/nar/gkn278 . 1362-4962 . 2447777 . 18483080.
Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C . T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension . Nucleic Acids Res. . Jul 2011 . 39 . Web Server issue . W13–7 . 21558174 . 3125728 . 10.1093/nar/gkr245.
Kemena C, Notredame C . Upcoming challenges for multiple sequence alignment methods in the high-throughput era . Bioinformatics . 2009-10-01 . 25 . 19 . 2455–65 . 19648142 . 2752613 . 10.1093/bioinformatics/btp452.
Chang JM, Di Tommaso P, Taly JF, Notredame C . Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee . BMC Bioinformatics . 2012-03-28 . 13 . S1 . 22536955 . 3303701 . 10.1186/1471-2105-13-S4-S1 . free .
Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C . Use of ChIP-Seq data for the design of a multiple promoter-alignment method . Nucleic Acids Res. . Apr 2012 . 40 . 7 . e52 . 22230796 . 3326335 . 10.1093/nar/gkr1292.
Web site: T-Coffee Server . 2023-12-26 . tcoffee.crg.eu.
Chang. JM. Di Tommaso. P. Lefort. V. Gascuel. O. Notredame. C. TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.. Nucleic Acids Research. 1 July 2015. 43. W1. W3-6. 25855806. 10.1093/nar/gkv310. 4489230.
Chang. JM. Di Tommaso, P . Notredame, C. TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction.. Molecular Biology and Evolution. Jun 2014. 31. 6. 1625–37. 10.1093/molbev/msu117. 24694831. free.