UTK EEB requires new graduate students to take an intensive team-taught course covering ecology and evolution. I present the phylogenetics portion (tree creation, not tree use). In 2010, I added learning R to the mix. In 2011, Jim Fordyce took over R intro so I can go more into depth for R for phylogenetics. I've also added a speciation lecture.
Note the anonymous feedback form: Suggest changes while it can still help you in class.
Topics:
Why phylo?
Continuous time Markov Chain finite state space
Doing this in Geiger: get exercise Discrete.R
Slides (PDF)
Topics:
In class review quiz
Central limit theorem CLT.R
Brownian motion, Ornstein Uhlenbeck, Independent Contrasts
Slides (PDF)
Topics:
In class review quiz
Likelihood vs Bayes redux
Model comparison exercise
Tree stretching
Tree stretching exercise
Slides (PDF)
Topics:
In class review quiz
Working through homework. What did you conclude? What does it mean? (relevant R script is here)
Delve deeper into how to do an analysis.
Diversification
Slides (PDF)
Topics:
Doing a good analysis
Slides (PDF)
Topics
Continuing pair analyses
Concatenation of all slides (PDF)
Study guide (PDF)
Topics:
Speciation
Hybridization
Slides: PDF.
For homework, due by start of class Monday, do steps 1-15 of this tutorial.
Topics:
What trees are used for
history of life + cool discoveries about topology
tree terminology
clades (dinos birds, reptiles, fish)
sister groups, crown/stem, etc.
rank and its importance
homology
heuristic search
treespace
Slides: PDF.
Topics:
Distance
Parsimony
Long branch attraction
Models
Model selection
Bayes/Likelihood
Slides: PDF
Homework:
The next assignment should be pretty fun. By next Monday, write a heuristic search algorithm for getting the maximum parsimony tree in R after downloading a set of GenBank sequences, also within R. I've sketched out a lot of the code -- the main tricky bit should be figuring out how to store the best tree as you go and learning how to use an if() function. Comments in ALL CAPS are the sections you should focus on for changing. You're encouraged to play with this assignment: maybe you want to plot the scores as you search, maybe you want to try different search strategies, etc. Feel free to email me for help or talk to your cohort-mates (remember you can use the blackboard forum, too). R template here.
Slides: PDF
Slides: PDF
Slides: PDF
Topics:
What trees are used for
history of life + cool discoveries about topology
Tree terminology
clades (dinos birds, reptiles, fish)
sister groups, crown/stem, etc.
rank and its importance
homology
Slides: pdf, keynote, mov (QuickTime movie)
R assignment:
By the start of class Wednesday, please:
1) Install R on a computer you can access (you don't need to bring it in, so a desktop at home is fine). You can get an installer from http://cran.r-project.org/ .
2) Read this web page: http://bodegaphylo.wikispot.org/Phylogenetics_and_Comparative_Methods_in_R
3) Read and do the exercises (aka "type ~8 lines") from this web page: http://bodegaphylo.wikispot.org/I._Getting_Started_with_R . Do the task view route (it's easier).
4) Write me an email with the info you get from the See Also section when you run ?read.tree after doing the tutorial (should be one line).
Topics:
Treespace/NP-nasty
History
Distance
Parsimony
Long branch attraction
Slides: pdf, keynote, mov (QuickTime movie)
R assignment: Plot the number of trees vs the number of taxa.
Here is the second assignment. Note that http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf is a very readable tutorial you may want to download (written by the same guy who wrote the phylogenetic package ape). The homework is due by start of class on Friday.
1. Fire up R. Input library(phangorn) to load the Phangorn package (who gets the geeky reference? Why the F->Ph transition? Get used to this sort of thing). If this gives you an error, try installing it -- install.packages("phangorn")
2. Create a plain text file (no formatting: .txt rather than .rdf) called treespace.R. We're going to be typing our commands in this.
3. Go to http://cran.r-project.org/doc/manuals/R-intro.html
4. Read and understand sections 1.5, 1.7, 1.10, 2.1, 2.3, 2.8, 3.2, 12.1, 12.1.4, and 13 [the other bits are handy, too, but I want to minimize your workload].
5. Read the documentation for allTrees (just type ?allTrees ), plot (same way), and length .
6. Treespace is big, as we'll learn on Friday. Let's see how big. In your treespace.R file, include the following commands (don't copy in "7. Don't paste this…" or the lines after that into the file). Remember that anything after a # sign is ignored by R (it's a comment). I'm also attaching this as a file in case the formatting is affected by emailing it.
library(phangorn) #to load the phangorn package
x<-c() #creates an empty x vector
y<-c() #creates an empty y vector
for (number_of_taxa in 3:8) { #this starts a loop: the first time through, variable number_of_taxa=3, the next time, it equals 4, then 5, etc. up to and including 8.
print("now doing this number of taxa:")
print(number_of_taxa)
x[number_of_taxa-2]<-number_of_taxa #here we are sticking the number of taxa into vector x. x[1] is the first slot in that vector, x[2] is the second slot in that vector… (it expands if it is not that size already -- not all languages do that). Think about why we do number_of_taxa - 2 rather than number_of_taxa.
#WAY TO STORE THE LENGTH OF THE OUTPUT FROM allTrees, WHICH IS THE NUMBER OF POSSIBLE TREES, FOR PLOTTING ON THE Y-AXIS
}
#PLOT THE NUMBER OF POSSIBLE TREES VERSUS THE NUMBER OF TAXA (DO A LOG SCALE FOR NUMBER OF TREES)
7. Don't paste this or following lines into your file
8. Replace the #WAY… and #PLOT… comments with functioning R code. I've given you hints (why did I have you look at what the allTrees command does? what's that y<-c() in the code for?).
9. Try running your code -- copy and paste into R, or on a Mac you can select all and then command+return to have it run (if you're editing the text within an R window), or you can use source("treespace.R") if you have set the working directory to the folder containing the file (?setwd or use an option in the menu).
10. Fix the errors, repeat 9 (see the joys of loops? -- "while (working==FALSE) { debug some more }").
11. Send me your final figure (screenshot, pdf, etc.) and your treespace.R file.
12. Is the number of trees growing exponentially with the number of taxa?
13. Are you sure? BONUS (optional): How can you tell?
14. If you're still up for it (this part is optional, too), modify your code to look at all trees from 3 to 9 taxa rather than from 3 to 8. You'll need to be very patient. Why?
Bang your head against this for awhile, then talk to your cohort-mates for help, and/or email me. A lot of programming issues, especially early on, are frustrating in the way these problems are (i.e., what function lets me count the number of trees? how do I stick things in a vector?), and it's salutary to try to work through them. There are generally several ways to solve each problem -- go for the one that works first. Later you can tweak with speed, memory usage, code beauty (yes, really), etc. There are a few mailing lists for help with R, and help with R for phylogenetics, but [most of you] are too new to R to start posting there.
R code for animation to make image
library(phangorn)
accessionNumbers <- c("U15717", "U15718", "U15719", "U15720", "U15721", "U15722", "U15723", "U15724") #These come from a paper by Paradis (1997) on Tanagers
#USE THE read.GenBank command TO DOWNLOAD THESE SEQUENCES FROM GENBANK AND PUT THEM IN AN OBJECT CALLED sequences.
#Note that it's read.GenBank, not read.Genbank, read.genbank, readGenBank, etc. Programming is usually very case-sensitive
distanceMatrix<-dist.dna(sequences) #get a distance matrix
print(distanceMatrix)
convertedSeqs<-phyDat(sequences) #convert to a format useful for phangorn
treeBad<-upgma(distanceMatrix)
#the UPGMA starting tree actually is not that bad (and a NJ tree is the same as the maximum parsimony tree). So let's make it challenging by giving you a bad starting tree (for larger data sets, you would not have this "problem" of a starting tree that is too good)
#note how to get the parsimony score of a tree (use convertedSeqs, not sequences)
while(parsimony(treeBad,convertedSeqs)<220) {
treeBad<-rNNI(treeBad)
}
maximumTries<-50
#GET SCORE OF STARTING TREE, treeBad
#STORE THE CURRENT BEST TREE, WHICH IS treeBad
#PRINT THE STARTING SCORE
for (tryNumber in 1:maximumTries) {
#GET A NEW TREE BY DOING newTree<-rNNI( __SOMETHING__ )
#COMPARE THE SCORE OF THE NEW TREE TO THE OLD TREE
#USE AN IF BLOCK (if (new #INFO ON IF() IS AT http://cran.r-project.org/doc/manuals/R-intro.html#Conditional-execution
#HERE IS AN EXAMPLE BLOCK
if (1<2) {
print("one is less than 2")
}
#PERHAPS MORE THINGS AFTER THE IF BLOCK
}
#PRINT THE FINAL SCORE
#PLOT THE FINAL TREE
#BONUS: THINK OF A WAY TO DO AN EXHAUSTIVE SEARCH USING THE allTrees COMMAND. WILL THIS WORK FOR MANY MORE TAXA?