Mutation Rates from Genome Resequencing | ||||||||
Changed: | ||||||||
< < | Motivation: You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates of different types of mutations from these data? What are the 95% confidence intervals on these values? | |||||||
> > | Motivation: You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates at which different types of mutation accumulate from these data? What are the 95% confidence intervals on these values? | |||||||
Case 1: Mutations with many identical sitesAssumptions:
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
>T = 10000 * 6 >rate = poisson.test(m) >rate$estimate/T event rate 0.0003666667 >rate$conf.int/T [1] 0.0002297880 0.0005551377 attr(,"conf.level") [1] 0.95
Case 2: One-time mutationsAssumptions: | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Example: Deletion of an unstable chromosomal region. Once deleted, it can never be deleted again.
Calculation:
| ||||||||
Changed: | ||||||||
< < | Issues: Pseudo-replication | |||||||
> > | More complex situations | |||||||
Changed: | ||||||||
< < | Issues: Different mutation rates in different lineages | |||||||
> > | What if you want to test for variation in rates of mutation accumulation? | |||||||
Added: | ||||||||
> > |
You can use Poisson regression in R (using glm() ) to judge whether there is a significant difference in the rates at which mutations accumulate relative to some factor. For example, you can test whether there is evidence that certain populations accumulated different numbers of mutations per unit time compared to others or whether mutations at certain sites were more common than at other sites. Fit a model that incorporates the relevant factor and one that does not, and then compare them using anova() .
What if you sequenced multiple genomes from each population?
This type of pseudo-replication complicates the statistical analysis because strains sequenced from one population are likely to share some of their evolutionary history. If they happened to evolve more rapidly by chance, you will overestimate rates by including both of them and assuming an independent time basis for each one. It is not easy to correct for this shared history. To do so in a rigorous way would likely require a resampling procedure. It would be valid to randomly pick one strain from each population and only include that one in the typical analysis—restoring the assumption of independence—but this is excluding some information.
ReferenceWe used the approaches described here to characterize and compare the rates of mutations in this paper: Renda, B.A., Dasgupta, A., Leon, D., Barrick, J.E. (2015) Genome instability mediates the loss of key traits by Acinetobacter baylyi ADP1 during laboratory evolution. J. Bacteriol. 197:872-881. https://doi.org/10.1128/JB.02263-14 |
Mutation Rates from Genome ResequencingMotivation: You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates of different types of mutations from these data? What are the 95% confidence intervals on these values?Case 1: Mutations with many identical sitesAssumptions:
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Case 2: One-time mutationsAssumptions:
Issues: Pseudo-replicationIssues: Different mutation rates in different lineages |
Mutation Rates from Genome Resequencing | ||||||||
Changed: | ||||||||
< < | Motivation: You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates of different types of mutation rates from these data? What are the 95% confidence intervals on these values? | |||||||
> > | Motivation: You have re-sequenced several genomes after a mutation accumulation or adaptive evolution experiment. How do you infer the rates of different types of mutations from these data? What are the 95% confidence intervals on these values? | |||||||
Changed: | ||||||||
< < | Case 1: Single-base substitutions | |||||||
> > | Case 1: Mutations with many identical sites | |||||||
Changed: | ||||||||
< < | Assumptions: The number of mutations is small compared to the number of sites. | |||||||
> > | Assumptions: | |||||||
Added: | ||||||||
> > |
| |||||||
Changed: | ||||||||
< < | If you restrict your data to one genome per experimental population, then you can calculate the 95% confidence limits by assuming this is a Poisson process (poisson.test in R). | |||||||
> > | Example: Single-base substitutions | |||||||
Changed: | ||||||||
< < | If you take multiple genomes from one experimental population, this is a type of pseudo-replication (they may have a shared evolutionary history). This makes calculating the 95% confidence intervals more complicated. | |||||||
> > | Calculation: | |||||||
Added: | ||||||||
> > |
| |||||||
Case 2: One-time mutations | ||||||||
Changed: | ||||||||
< < | Assumptions: A mutation can only happen once per genome. | |||||||
> > | Assumptions: | |||||||
Added: | ||||||||
> > |
| |||||||
Changed: | ||||||||
< < | Example: Deletion of a chromosomal region. Once deleted, it can never be deleted again. | |||||||
> > | Example: Deletion of an unstable chromosomal region. Once deleted, it can never be deleted again. | |||||||
Changed: | ||||||||
< < | This is a type of "survival analysis". You can calculate the fraction of genomes that have and do not have your mutation. Then consider this a binomial process, to calculate a 95% confidence interval Then, convert this to a per-generation rate by dividing by the number of mutations. | |||||||
> > | Calculation: | |||||||
Added: | ||||||||
> > |
| |||||||
Added: | ||||||||
> > | Exact binomial test
data: n - m and n
number of successes = 7, number of trials = 12, p-value = 0.7744
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.2766697 0.8483478
sample estimates:
probability of success
0.5833333
Issues: Pseudo-replicationIssues: Different mutation rates in different lineages |