Phyloseminar 2015: Tutorial for density dependent diversification

diversitydependent.pdf diversitydependent.Rmd

diversity dependent diversification tutorial

Rosana Zenil-Ferguson

November 8, 2015

library("TreeSim")
## Loading required package: ape
## Loading required package: geiger
## Loading required package: laser
library("TreePar")
## Loading required package: Matrix
## Loading required package: subplex
## Loading required package: deSolve
## 
## Attaching package: 'deSolve'
## 
## The following object is masked from 'package:graphics':
## 
##     matplot

If you have not installed them you should use the function

#install.packages("nameofthepackage")

Tutorial modified from Tanja Stadler’s PhyloDynamics seminar.

Part 1: Density Dependent Diversification Estimates

Simulate a tree using TreeSim package. We are assuming that the change in shape of the tree is given by a shift in the diversification rates (like a mass extinction) and not because of diversity dependence (we will test this in the next section). Input for the simulation will be:

  • 10 extant species
  • lambdavect=c(2,0.1), this represents a shift in birthrate
  • muvect=c(0,0.05), this represents a shift in extinction rate
  • All species are sampled before and after the change in diversification rates
  • time=c(0,1) the shift in rate occure 1 unit of time ago (i.e. mya)
  • We do not see lineages extinct
set.seed(1)
#tree1<-sim.rateshift.taxa()[[1]]

Plot the tree and the lineage through time. Do you notice the “pull of the present”?

Let’s calculate speciation times using the following instruction

#x<-sort(getx(tree1),decreasing=TRUE)

So now we estimate our density dependent diversification rates. Three parameters are of importance (K, lambda_n, mu) mu is fixed but not zero, lambda_n depends on the number of lineages and K is the carrying capacity. For this part we will use function bd.densdep.optim in TreePar there are faster options in DDD and expoTree (not very intuitive).

What are the input choices for bd.densdep.optim? What would you specify in order to find MLEs?

#res.DDD<-bd.densdep.optim()

Part 2: Comparing against a model with a shift in rates

If we want to compare to the model with shifts we can optimize using bd.shifts.optim.

#res.shifts <- bd.shifts.optim(x,c(rho,1),0.1,0.1,2.1)[[2]][[2]]

How are your estimates for the parameters? What are the potential problems?

To make model comparisons we can calculate AIC values for both models

#aic.ddd <- 2*3+2*res.DDD$value
#aic.shifts <- 2*5+2*res.shifts[1]

What should be happening given what you simulated?

Part 3: Challenge

Can you simulate a tree with density dependent diversification? Check the notes of function sim.rateshift.taxa()

Now, fit the density dependent model and the shift dependent model, which one should fit better?

Phyloseminar 2015: BAMM tutorial

BAMM Tutorial Phyloseminar 2015.pdf

This tutorial largely follows the tutorials by Dan Rabosky on his website.

Installation

BAMM consists of two components, the program that runs the actual analysis is in C++ (to be faster), while the analysis of the outputs happens in R.

BAMM

Easiest might be to install it from the executables (both for Win and OS X), although from source and using homebrew is also possible.

  1. Go to the Download-Page
  2. download the appropriate BAMM version for your platform
  3. download the zipped example files
  4. unpack everything, in case of OS X, move to the appropriate directory in the command line and type

    tar -xzf bamm-2.3.0-MacOSX.tar.gz
  5. move the files and folders to the directory you want

BAMMtools

For this one, simply open R and type

install.packages('BAMMtools')

If it is installed, load the package using

library(BAMMtools)

Run BAMM

In order to run BAMM, we need nothing more than a tree and a controlfile. We will use the whale examples in this tutorial. To run your own analysis, it is a good idea to simply modify the example controlfile or use the template controlfile.

Let’s start the whale analysis and then look at the controlfile.

Run the Whales

It is easiest to use BAMM when all files are in the same directory where BAMM is, so:

  1. move the whale tree and the controlfile in the same folder as BAMM
  2. to get a result that is looking a bit more realistic, we open the ‘divcontrol’ file and change numberOfGenerations from 10k to 1Mio
  3. change the directory of your command line to that folder
  4. start the analysis by typing:
./bamm -c divcontrol.txt   # on windows you can omit the './'
  1. wait for the MCMC to start running
  2. the output files will be
  • chain_swap.txt
  • event_data.txt
  • mcmc_out.txt
  • prior_probs.txt
  • run_info.txt

The main data we want to look at later is in the event_data file, and we look at the file mcmc_out to assess convergence. You might want to keep the run_info file with your other files in order to know what settings you used for a specific analysis.

A look at the controlfile

Go to the examples folder and find the ‘whales’ controlfile and tree in the ‘diversification’ subfolder. Open the controlfile ‘divcontrol.txt’. The file is pretty well annotated and most points should be fairly self-explanatory, we will quickly discuss some of them though.

Sections:

  • General Setup and Data Input
  • Priors
  • MCMC Simulation Settings & Output Options
  • Operators: MCMC Scaling Operators
  • Operators: MCMC Move Frequencies
  • Initial Parameter Values
  • Metropolis Coupled MCMC
  • Numerical and Other Parameters

Most values are preset in a way that will work out well, and most users work along the policy to not touch anything if you don’t know what it does, but a few parameters are worth looking at (the bold ones you definitely have to modify for your own analysis, or at least consider it…):

  • modeltype
  • treefile
  • useGlobalSamplingProbability
  • globalSamplingFraction
  • sampleProbsFilename

  • all the priors…

  • numberOfGenerations
  • mcmcWriteFreq
  • eventDataWriteFreq
  • printFreq
  • acceptanceRateFreq

  • scaling operators…

  • move frequencies…

  • initial values…

  • numberOfChains
  • deltaT
  • swapPeriod

  • minCladeSizeForShift
  • segLength

More in-depth treatment of how to fine-tune the analysis and how to use advanced features can be found here

Analyse outputs with BAMMtools

Now we want to dig into the outputs of the analysis, which we do in R. There are a number of different options regarding what we can do and should do, and we try to look at the most important ones here.

Convergence

Just like for any MCMC analyses, e.g. when we date a tree with BEAST, we want to know if we ran the analysis long enough to reach convergence. To do this for BAMM, we look at the file mcmc_out in R:

mcmc <- read.csv("/Users/orlandoschwery/Documents/UT/Courses/Fall 15/Phyloseminar/BAMMtutorial/mcmc_out.txt", header=T)  # load file
dim(mcmc)  # check dimensions
head(mcmc)  # check head

Now we can look at the trace plot of the log likelihood per generation (and if we want at the numbers of shifts per generation)

plot(mcmc$logLik ~ mcmc$generation)
plot(mcmc$N_shifts ~ mcmc$generation)

Now we want to discard a reasonable portion as burnin, e.g. 10%…

burnstart <- floor(0.1 * nrow(mcmc))
mcmcpost <- mcmc[burnstart:nrow(mcmc), ]

… and calculate the effective sample size, which should exceed 200 if our analyses ran long enough (although different people might argue for different minimal values here).

library(coda)
effectiveSize(mcmcpost$logLik)  # calculates autocorrelation function
effectiveSize(mcmcpost$N_shift)  # effective size on N_shifts

## do this on pre-burnin MCMC
effectiveSize(mcmc$logLik)
effectiveSize(mcmc$N_shift)

Rate Shifts

Now for the thing we are most interested in: the rate shifts. We load the package BAMMtools, the tree we used and the event_data file, for which we specify the same burnin we decided on before:

library(BAMMtools)
tree <- read.tree("/Users/orlandoschwery/Documents/UT/Courses/Fall 15/Phyloseminar/BAMMtutorial/whaletree.tre")
ed <- getEventData(tree, "/Users/orlandoschwery/Documents/UT/Courses/Fall 15/Phyloseminar/BAMMtutorial/event_data.txt", burnin=0.1)

It is possible to subset the event data file, if the size is too big for R to handle.

Phylorates Plots

Now we can just look at the mean rates inferred:

plot.bammdata(ed, legend=T, spex='netdiv')

We can also look at speciation or extinction rates separately:

plot.bammdata(ed, legend=T, spex='s')
plot.bammdata(ed, legend=T, spex='e')

We can get a first idea on what number of shifts were most commonly found using summary (this just tells us how many samples contain a certain number of shifts, but not WHERE those shifts are):

summary(ed)

Bayes Factors for Shift Models

To get a more robust idea of how certain shift models are performing compared to each other, we can calculate Bayes Factors:

postfile <- mcmc  # since we already loaded this
priorfile <- read.csv("/Users/orlandoschwery/Documents/UT/Courses/Fall 15/Phyloseminar/BAMMtutorial/prior_probs.txt", header=T)
bfmat <- computeBayesFactors(postfile, priorfile, burnin=0.1)

The resulting table shows how models with certain numbers of shifts compare to a specific other one, BFs of >20 indicate good support that a model is better than the one it is compared to, BFs of >50 indicate very strong support.

Credible Shift Sets

What our data actually looks like now, are different samples of shift regimes, all of which are sampled at a certain frequency. We can access each of them, and it is a good idea to visualize the most frequent ones:

pset <- getBranchShiftPriors(tree, priorfile)
cset <- credibleShiftSet(ed, pset, BFcriterion=3)
plot.credibleshiftset(cset, lwd=2.5)

Best Shift Set

Based on the credible shift sets, we might conclude that the most commonly sampled configuration of shifts is the best one:

best <- getBestShiftConfiguration(ed, prior = pset)
plot.bammdata(best, lwd = 2)
addBAMMshifts(best, cex=2.5)

Maximum Shift Credibility

Alternatively, pick the shiftset that maximizes the marginal likelihood of a shift on a branch, a set of maximum shift credibility, similar to when we get a maximum clade credibility tree out of a set of trees:

msc.set <- maximumShiftCredibility(ed, maximize='product')
msc.config <- subsetEventData(ed, index = msc.set$sampleindex)
plot.bammdata(msc.config, lwd=2)
addBAMMshifts(msc.config, cex = 2)

Macroevolutionary Cohort Analysis

When we have rate shifts to lower or higher rates, some sampled configurations might cancel each other out ( i.e. some say speedup for sisterclade A, some say slowdown for sisterclade B). We can catch this (and more) by looking at the cohort matrix:

cmat <- getCohortMatrix(ed)
cohorts(cmat, ed)

Fin!