Week 2. September 13 & 15. - Character data and trees.


1) We assume that evolution occurs via speciation leaving behind a branching pattern of diversification.

2) Character state changes accompany this and are evidence of the pattern of descent.

3) Overall character congruence (via a distance, optimality criterion, or other form of inference) is used to evaluate hypotheses of species history.

Nature of data used in systematics.

When we discuss diversity or variation, we are referring to alternate forms of various structures, behaviors, ecological preferences, etc. that collectively constitutes attributes, traits, or characters. Although the specific definitions of these terms are not set in stone, one system uses attributes to describe all variable features of an organism, traits to be those that are not genetically determined, and characters as those that are phylogenetically useful. Individual organisms, populations, species, and higher taxa all show various levels of differentiation but this is ultimately some form of character or trait variation. Such variation can be caused by many factors including genetic changes, developmental processes, environmental factors, or learned responses. While most morphological and behavioral features are the product of both genotype and environmental factors, it is the genotypically controlled features that we are most interested in. Going back to evolution as decent with modification, the modifications must have a genetic basis if they are to be passed on to subsequent generations.

Characters and character states.

In systematics, "a character is a feature of an organism that is the product of an ontogenetic or cytogenetic sequence of previously existing features, or a feature of a previously existing parental organism. Such features arise in the evolution of a previously existing ontogenetic or cytogenetic or molecular sequence." (Wiley, 1981)

Character states are the alternate forms of a particular character. At the most fundamental level, character variation (especially molecular sequence variation) can arise from point mutations, insertion/deletion events, and relatively minor sequence rearrangements. More profound (and often more complicated) characters can arise via gene duplication, new combinations of alleles, changes in genetic background, pleiotropy, epistasis, and new developmental pathways. The critical issue is that the feature or attribute being considered for phylogenetic utility must be genetically controlled and inherited.

The total collection of characters or the complete description of the organism is referred to as its holomorphology ("total form"). This concept is particularly useful for organisms that have more than one life stage. A semaphoront is an organism at a specific life history stage such as the larval stage of an insect or the seed of a pine. Comparable semaphoronts are to be examined in a phylogenetic analysis. This rule is particularly important with developmentally or environmentally influenced characters as different life history stages may involve modification of structures already present in earlier stages.


As introduced by Owen, (homology) refers to the "true" similarity between structures as opposed to superficial similarity (analogy). These terms hold for both characters and character states. In all cases, several rules must be followed:

1) There must be a specifying phrase: "homologous to what?"

2) Homology exists at a certain hierarchical level.

3) Anatomical comparison is one test of homology (as promoted by Owen and followers).

Similarity by various definitions may be an indication of relationship and often serves as first evidence for a hypothesis of homology or relationship. For example, structural characters may have similar shapes or functional characters may have similar functions. While these features often have utility in determining relationship, convergence in form or function may lead to false assessments of homology (assuming that relationship is what we want to discover and identify). What we want in a character is phylogenetic utility or similarity due to a common history

Hennig (1950, 1965, 1966) differentiated special from general similarity based on the polarity or direction of character state change.

1) primitive or plesiomorphic similarity

2) derived or apomorphic similarity

3) homoplasy or conflicting similarity

Overall similarity may or may not be best estimate of relationship. If enough characters are convergent (say we are scoring a broad range of pollination-related characters or aquatic habit-related characters), we may group organisms more on current function or superficial similarity than on genealogical descent. These groups will tend to be polyphyletic. Likewise, groups that share plesiomorphic character states (rather than apomorphic states) probably do not include all descendents of the common ancestor and are usually paraphyletic. We want to identify derived character states that are shared (synapomorphies) by groups of organisms (taxa). Groups so defined are monophyletic; that is, they contain the most recent common ancestor and all descendents. Groups that do not contain all of the descendents are paraphyletic. Groups for which the most recent common ancestor is assigned to another group are polyphyletic.

It is claimed that Hennig’s primary contribution was distinguishing paraphyly from a refined concept of monophyly and that the systematist’s main occupation is the dismantling of paraphyletic groups such as Reptilia, Invertebrates, Gymnosperms, and Protista.

Patterson (1982) equated homology with synapomorphy, the central concept in cladistic phylogenetics. Additionaly, he discussed three tests for homology: similarity, conjunction, and congruence.

Similarity - For morphological characters, this is based on topographic correspondence and ontogenetic transformation. Usually, this is only the basis on which homology may be postulated.

Conjunction - "Anatomical singulars". Two homologues could not exist in the same individual. With duplicated structures (morphological or molecular), this becomes less straightforward.

Congruence - synapomorphy coupled with weight of evidence from other character data.


Homoplasy and character conflict

Theories of character evolution versus theories about groups: decoupling phylogeny from evolution (pattern or transformed cladistics).

Evolutionary taxonomy.

Character types, coding, order, polarity

Characters vary in either a continuous or discrete fashion. Discrete characters may be binary (0/1; +/-; yes/no; present/absent; etc.) or multi-state (red/yellow/blue/green; 0/1/2/3; etc.). Some multi-state characters can be broken down into several additive binary states. Some types of analysis are influenced by whether or not the characters are binary or multi-state

Continuous characters are particularly important at the species and population level (allele frequencies, length differences, part numbers, etc.). In general, continuous characters are hard to deal with in a cladistic analysis but are more or less assumed in a phenetic analysis. (see refs).

Character coding (identifying alternate states of a character) is to be contrasted with character ordering (establishing allowable transformation series), direction (costs of state to state transformations) and character state polarity (the specific sequence by which evolution proceeded in the group under study).

Once characters are coded and ordered (if they need to be), direction of state transformation is considered. The direction (sensu Swofford, 1990) concerns the "cost" in going from one state to another (in terms of tree length).

The specific polarity of how the characters changed is determined in the context of the taxa under study. This does not have to happen before the phylogenetic analysis is conducted but it is often an iterative process of character re-evaluation.

The three main approaches to establishing polarity are:

1) outgroup comparison

2) ontogenetic criteria

3) fossil and stratigraphic data

The outgroup method is indirect in that it refers to character states in predetermined "outgroups". States found in the outgroup and some of the ingroup taxa are assumed to be plesiomorphic. Invariant characters are phylogenetically uninformative (at least in a parsimony sense). Operational rules are spelled out in Forey (p23).

Ontogenetic methods are direct in that they are actually looking at how characters change from one state to another.

Fossils provide another, somewhat direct method in that a temporal sequence is involved.

Commonality principle.

a priori models

"Morphological" vs. molecular characters

Much of the discussion of how phylogenetics and cladistics works is based on "morphological" data. This is a catch-all phrase for everything developmentally more "derived" than a protein. Molecular data are usually delimited from the protein level down (although there are many exceptions to this).

Combining or partitioning data.

A major area of discussion is how to best integrate different kinds of data for the same set of taxa. We will explore this more deeply later in the semester.


Tree thinking

A tree is a mathematical structure that shows strict dichotomous branching and no reunion of branches. Networks may contain cycles. Nested hierarchies show tree-like structure but can also be represented by indented lists, ((,)-notation, Venn-diagrams, etc. Computer file structures and various electronic, physical, and engineering approaches use the same idea of nested hierarchies or tree thinking to solve problems. Practical applications include optimal solutions to wiring (telephones, cable networks, etc.), pipelines, printed circuits, roads, etc. In each case, a straight line may not be optimal or feasible as tunnels, loops, or other deviations may be necessary to connect points on a landscape. Furthermore, there may be constraints on the weight, thickness, cost per length, resistance, conductivity, need for amplification, or other factors. Many of these issues have equivalents in the construction of phylogenetic trees.

Tree terminology (see texts for more discussion and synonyms):

Trees consist of nodes, both terminal and internal, connected by branches. Terminals represent taxa, species, individuals, etc. (living or extinct) while internal nodes represent hypothetical common ancestors. Note that organisms, populations, species, etc. exist along the branches but they are not participating in speciation or cladogenesis. Nonetheless, anagenetic change may be occurring along the branches. The root of the tree refers to the node connecting the study group to the most recent common ancestor of the study group.h

Trees may have their branch lengths indicated or not. Those that do not but focus only on the pattern of branching are called cladograms. Trees that have branch lengths drawn proportional to their evolutionary or character change lengths are called phylograms. Distance trees can be either additive or ultrametric. The former may have variable branch lengths while the latter assumes a molecular clock and has all branch lengths drawn flush along the line of the terminals. In all cases, these trees are not phylogenies or phylogenetic trees but only estimates of the true phylogeny. Extensive literatures exist both for the mathematical aspects of trees as well as their philosophical underpinnings.

A fully resolved tree shows nothing but dichotomous branching that reflects the pattern of speciation. Phenomena such as hybridization, lateral gene transfer, or symbiosis may result in a reunion of branches. Additionally, a tree may be resolved with more than two branches emerging from a node. These polytomies (as opposed to dichotomies) may be either hard (due to true simultaneous speciation events) or soft (due simply to inadequate data).

A rooted tree gives an indication of the direction of change for characters.

The total number of possible trees for even a small set of taxa can be extremely large. With 50 taxa, the total number of fully resolved, dichotomously branching trees can exceed the number of particles in the known universe. Given this, one typically searches among all trees only when a comparatively small set of taxa are involved.

In phyogenetic analysis, trees are built using some method or algorithm then evaluated using one or more optimality criterion. Searching among alternate hypothesis of relationship (tree topologies with or without branch lengths) to find the one that best fits the optimality criterion is often one of the most time-intensive parts of a phylogenetic study.

Splits are methods to divide taxa into two separate groups and examine the evidence for (or against) that particular division.

Our task in phylogenetic systematics is to reconstruct the pattern of character state change among a group of taxa then use this tree as an estimate of phylogenetic relationships among the study organisms. If data were to evolve in a perfectly clean fashion with no convergences, back mutations, change in rates of evolution, or other forms of bias, the reconstruction of these relationships would be relatively straightforward. Unfortunately, the situation is very much complicated by these tendencies and we need to use various tools and strategies to achieve our goal of a resolved, well supported, and (hopefully) accurate phylogeny.

The tree alone only implies relative relationship. Branch lengths offer an estimate of the "absolute" relationships when taken in the context of the tree topology.

For an excellent discussion of trees and "tree space", see MA Charleston’s site.


See week 1 class notes