---+ Mutation Rates from Genome Resequencing *Motivation:* You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates of different types of mutations from these data? What are the 95% confidence intervals on these values? ---++ Case 1: Mutations with many identical sites *Assumptions:* 1 The number of mutations is small compared to the number of sites. 1 There are no back mutations (reversions). 1 Mutations rates are constant over time and across sites. *Example:* Single-base substitutions *Calculation:* 1 If you restrict your data to one genome per experimental population, then you can calculate the maximum likelihood value and 95% confidence limits from a Poisson distribution. Count the total number mutation (_m_) and the total elapsed generations or time of independent evolution (_T_). Example: 22 point mutations found in 6 genomes that each evolved for 10,000 generations. %BR%<verbatim>>m = 22 >T = 10000 * 6 >rate = poisson.test(m) >rate$estimate/T event rate 0.0003666667 >rate$conf.int/T [1] 0.0002297880 0.0005551377 attr(,"conf.level") [1] 0.95 </verbatim> 1 If you know the number of sites at risk for the mutation (_s_), then you can calculate a per-site mutation rate. Example: Assume these 22 point mutations are A to G substitutions and there are 1,342,726 A bases in the original genome. %BR%<verbatim>>s = 1342726 >rate$estimate/(T*s) event rate 2.730763e-10 >rate$conf.int/(T*s) [1] 1.711355e-10 4.134408e-10 attr(,"conf.level") [1] 0.95 </verbatim> ---++ Case 2: One-time mutations *Assumptions:* 1 The mutation can only happen once per genome. 1 The mutation rate is constant per unit time or generation *Example:* Deletion of an unstable chromosomal region. Once deleted, it can never be deleted again. *Calculation:* 1 Count the number of independent genomes that have the mutation (_m_) and total number of genomes analyzed (_n_) at a given time (_T_). Example: 5 of 12 independently evolved genomes have the mutation after 10,000 generations. %BR% <verbatim>> m = 5 > n = 12 > T = 10000 </verbatim> 1 Calculate a maximum likelihood value and 95% exact (Clopper-Pearson) confidence limit for the fraction of independently evolved lineages that __do not have__ the mutation from your observations. %BR% <verbatim>p = binom.test(n - m, n) >p Exact binomial test data: n - m and n number of successes = 7, number of trials = 12, p-value = 0.7744 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.2766697 0.8483478 sample estimates: probability of success 0.5833333 </verbatim> 1 If the mutations happen at a constant rate per unit time, then you can calculate the rate that gives this fraction of independent lineages without a mutation up to the given time point using the zero event term from a [[http://en.wikipedia.org/wiki/Poisson_process][Poisson process]]: %BR% <verbatim>> -log(p$estimate) / T probability of success 5.389965e-05 > -log(p$conf.int) / T [1] 1.284931e-04 1.644646e-05 attr(,"conf.level") [1] 0.95 </verbatim> This is a particularly simple type of [[http://en.wikipedia.org/wiki/Survival_analysis][survival analysis]]. ---++ Issues: Pseudo-replication ---++ Issues: Different mutation rates in different lineages
This topic: Lab
>
WebHome
>
ProtocolList
>
ProceduresCalculatingMutationRatesFromGenomicData
Topic revision: r3 - 2012-07-25 - JeffreyBarrick