Joe Thornton's notes on DNA sequence manipulation programs
May 3, 1999
To: Darcy Kelley
Fm: Joe Thornton
Re: Sequence analysis software
In response to your request for information on sequence analysis software, here is a
brief note on what I use, its strengths and weaknesses, and a recommendation for some
programs I feel the department should obtain. This is not intended as an exhaustive
evaluation; although I do quite a bit of analysis of DNA and protein sequences, this list
is circumscribed by my personal experience.
- GCG -- for miscellaneous applications. I
use GCG to a very limited extent, because I find it user-unfriendly. While it contains
many individual applications, Ive yet to encounter a need that I cant fulfill
with one of the following easier-to-use programs. Interestingly, while GCG has been a
standard tool for many for molecular biologists, virtually no one in the evolutionary
biology community (including those who work primarily with sequences) uses it.
- CLUSTAL-X -- Multiple sequence alignment.
Clustal includes a powerful and reliable algorithm for alignment of protein and nucleotide
sequences. It is easy to use, and it is configured with a generally effective set of
default alignment parameters. It is easy to customize the parameters, which is important
to anyone who doesnt want alignment to be a "black boxed" process. The
Macintosh version (and I believe the PC version, as well) allow the aligned sequences to
be viewed on screen within the program, which makes it much more convenient than GCG; it
also exports its alignments in a variety of text-based formats, including GCG, NBRF/PIR,
Phylip, GDE, and Clustal. Clustal is free from EMBOs web-site, and it is available
for Mac, PC, and Unix.
While there are other sequence alignment programs, I consider this a generally valid
and trustworthy algorithm, because it uses sophisticated multiple-sequence alignment
methods guided by a phylogenetic tree, which the program itself evaluates from the
- MEGALIGN -- sequence alignment. Megalign
is part of Lasergenes commercial DNA-Star package. It includes the Clustal
algorithm, along with TreeAlign/Jotun Hein, another reliable alignment method that refines
its guide trees using a parsimony criterion -- a distinct advantage philosophically over
Clustals neighbor-joining approach. Megalign also allows alignments to be
edited/adjusted on-screen, and alignments can be printed out with color-coding, a
consensus, and sequence numbers. It also allows one-click translation/reverse translation
between nucleotide and amino acid sequences -- a very useful feature for maintaining
reading frame during the alignment of coding sequences.
- PAUP 4.0 -- Phylogenetic analysis. The new
PAUP does everything one could ever want in a phylogenetic analysis program. It performs
both parsimony and maximum likelihood analyses, with full control over parameters and more
options than any one person will ever use. It also performs distance-based tree
reconstructions, including UPGMA and neighbor-joining methods, for those looking for a
quick-and-dirty (but unreliable) tree. It does bootstraps, distance matrices,
character-change matrices, and everything else you could ever think of. It is an easy to
use menu-driven program in its Mac incarnation. It is also available for Unix and for
windows. I assume the latter is menu-driven, though I have never used. The program is
commercially available from Sinauer, and it is cheap (about $100, I think). For those who
want to do more extensive evolutionary character analysis, the software of choice is
MacClade (also from Sinauer), but I doubt anyone in this department will undertake this
kind of work. Many people also use Phylip (University of Washington) for a few features
not found in PAUP; overall, it is less powerful and complete than the new PAUP.
- OLIGO -- Primer design and analysis. Oligo
4.0 is very good for analyzing sequences for primer design, calculating melting
temperatures, and identifying problems, such as hairpins, primer-dimers,
temperature-mismatched pairs, and so on.
[I prefer the free Web-based Primer3 - see Web-based
- BLAST searches -- I do this on-line
- GENBANK searches -- I do this on-line
- EDITSEQ -- basic sequence editing. I find
EditSeq, part of the DNAStar package from Lasergene, extremely useful for basic sequence
manipulation. It locates reading frames, translates, reverse translates,
reverse-complements, and saves files in various formats. It also gives statistics on the
sequence in the file -- base or amino acid composition, melting temperature, etc.
- SEQUENCHER -- Analysis of automated
sequence files. This is the most important program that the department is currently
lacking. It is the best available software for dealing with sequence files provided by an
automated sequence machine or analysis center. It offers a very powerful and easy-to-use
method for automatically aligning sequence fragments (including those read on opposite
strands) and assembling and editing of contigs directly from automated sequence output
files. It allows easy location of ambiguous sites in a sequence, and a single click allows
the user to view the electropherograms at that base in order to judge which sequence is
most reliable. It also finds ORFs, translates in multiple frames, analyzes restriction
sites, locates common vector sequences, screens for transposon sequences, trims dirty ends
from sequence files based on ambiguity in the pherogram, performs heterozygote analysis
for mutation studies. No one who does or has automated sequencing done for them should be
without this program.
Sequencher is available for Mac; a Windows version was supposedly scheduled for release
last year, but I dont know if it came out or not. It will run on a network and can
be purchased for any given number of simultaneous users (that means as many as x persons
can be using the program at one time, but there is no limit on the number of potential
users, so long as they are connected to the network). It is not cheap -- around $1500 per
user. In my view, however, it is clearly worth it.
Sequencher is published by Gene Codes, www.genecodes.com, phone 734-769-7249.