May 3, 1999
To: Darcy Kelley
Fm: Joe Thornton
Re: Sequence analysis software
Dear Darcy:
In response to your request for information on sequence analysis software, here is a
brief note on what I use, its strengths and weaknesses, and a recommendation for some
programs I feel the department should obtain. This is not intended as an exhaustive
evaluation; although I do quite a bit of analysis of DNA and protein sequences, this list
is circumscribed by my personal experience.
- GCG -- for miscellaneous applications. I
use GCG to a very limited extent, because I find it user-unfriendly. While it contains
many individual applications, Ive yet to encounter a need that I cant fulfill
with one of the following easier-to-use programs. Interestingly, while GCG has been a
standard tool for many for molecular biologists, virtually no one in the evolutionary
biology community (including those who work primarily with sequences) uses it.
- CLUSTAL-X -- Multiple sequence alignment.
Clustal includes a powerful and reliable algorithm for alignment of protein and nucleotide
sequences. It is easy to use, and it is configured with a generally effective set of
default alignment parameters. It is easy to customize the parameters, which is important
to anyone who doesnt want alignment to be a "black boxed" process. The
Macintosh version (and I believe the PC version, as well) allow the aligned sequences to
be viewed on screen within the program, which makes it much more convenient than GCG; it
also exports its alignments in a variety of text-based formats, including GCG, NBRF/PIR,
Phylip, GDE, and Clustal. Clustal is free from EMBOs web-site, and it is available
for Mac, PC, and Unix.
While there are other sequence alignment programs, I consider this a generally valid
and trustworthy algorithm, because it uses sophisticated multiple-sequence alignment
methods guided by a phylogenetic tree, which the program itself evaluates from the
sequences.
- MEGALIGN -- sequence alignment. Megalign
is part of Lasergenes commercial DNA-Star package. It includes the Clustal
algorithm, along with TreeAlign/Jotun Hein, another reliable alignment method that refines
its guide trees using a parsimony criterion -- a distinct advantage philosophically over
Clustals neighbor-joining approach. Megalign also allows alignments to be
edited/adjusted on-screen, and alignments can be printed out with color-coding, a
consensus, and sequence numbers. It also allows one-click translation/reverse translation
between nucleotide and amino acid sequences -- a very useful feature for maintaining
reading frame during the alignment of coding sequences.
- PAUP 4.0 -- Phylogenetic analysis. The new
PAUP does everything one could ever want in a phylogenetic analysis program. It performs
both parsimony and maximum likelihood analyses, with full control over parameters and more
options than any one person will ever use. It also performs distance-based tree
reconstructions, including UPGMA and neighbor-joining methods, for those looking for a
quick-and-dirty (but unreliable) tree. It does bootstraps, distance matrices,
character-change matrices, and everything else you could ever think of. It is an easy to
use menu-driven program in its Mac incarnation. It is also available for Unix and for
windows. I assume the latter is menu-driven, though I have never used. The program is
commercially available from Sinauer, and it is cheap (about $100, I think). For those who
want to do more extensive evolutionary character analysis, the software of choice is
MacClade (also from Sinauer), but I doubt anyone in this department will undertake this
kind of work. Many people also use Phylip (University of Washington) for a few features
not found in PAUP; overall, it is less powerful and complete than the new PAUP.
- OLIGO -- Primer design and analysis. Oligo
4.0 is very good for analyzing sequences for primer design, calculating melting
temperatures, and identifying problems, such as hairpins, primer-dimers,
temperature-mismatched pairs, and so on.
[I prefer the free Web-based Primer3 - see Web-based
program list....LAC]