Mutation Rates from Genome Resequencing

Motivation: You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates of different types of mutations from these data? What are the 95% confidence intervals on these values?

Case 1: Mutations with many identical sites


  1. The number of mutations is small compared to the number of sites.
  2. There are no back mutations (reversions)
  3. Mutations rates are constant over time and across sites.

Example: Single-base substitutions


  1. If you restrict your data to one genome per experimental population, then you can calculate the maximum likelihood value and 95% confidence limits from a Poisson distribution. Count the total number mutation (m) and the total elapsed generations or time of independent evolution (T). Example: 22 point mutations found in 6 genomes that each evolved for 10,000 generations.
    >m = 22
    >T = 10000 * 6
    >rate = poisson.test(m)
      event rate 
    [1] 0.0002297880 0.0005551377
    [1] 0.95
  2. If you know the number of sites at risk for the mutation (s), then you can calculate a per-site mutation rate. Example: Assume these 22 point mutations are A to G substitutions and there are 1,342,726 A bases in the original genome.
    >s = 1342726
      event rate 
    [1] 1.711355e-10 4.134408e-10
    [1] 0.95

Case 2: One-time mutations


  1. The mutation can only happen once per genome.
  2. The mutation rate is constant per unit time or generation

Example: Deletion of an unstable chromosomal region. Once deleted, it can never be deleted again.


  1. Count the number of independent genomes that have the mutation (m) and total number of genomes analyzed (n) at a given time (T). Example: 5 of 12 independently evolved genomes have the mutation after 10,000 generations.
    > m = 5
    > n = 12
    > T = 10000
  2. Calculate a maximum likelihood value and 95% exact (Clopper-Pearson) confidence limit for the fraction of independently evolved lineages that do not have the mutation from your observations.
    p = binom.test(n - m, n)
       Exact binomial test
    data:  n - m and n 
    number of successes = 7, number of trials = 12, p-value = 0.7744
    alternative hypothesis: true probability of success is not equal to 0.5 
    95 percent confidence interval:
     0.2766697 0.8483478 
    sample estimates:
    probability of success 
  3. If the mutations happen at a constant rate per unit time, then you can calculate the rate that gives this fraction of independent lineages without a mutation up to the given time point using the zero event term from a Poisson process:
    > -log(p$estimate) / T
    probability of success 
    > -log(p$ / T
    [1] 1.284931e-04 1.644646e-05
    [1] 0.95

This is a particularly simple type of survival analysis.

Issues: Pseudo-replication

Issues: Different mutation rates in different lineages

Edit | Attach | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...

 Barrick Lab  >  ProtocolList  >  ProceduresCalculatingMutationRatesFromGenomicData

Topic revision: r2 - 13 Mar 2012 - 03:46:19 - Main.JeffreyBarrick
This site is powered by the TWiki collaboration platformCopyright ©2018 Barrick Lab contributing authors. Ideas, requests, problems? Send feedback