<table><tr><td width=88><img src="%PUBURL%/%WEB%/SharedImages/wip.gif"></td> <td valign="center"><b><font color="red" size=+2>Warning!</font><br><font size="+1">This tool is under active development. Its capabilities and usage may change without warning.</font></b></td></tr></table> ---+ Marker Divergence Experiments This workflow implements a method for extracting effective beneficial mutation rates (_μ_) and selection coefficients (_s_) from marker divergence experiments [[#ReferenceAnchor][[1]]]. This is a way of parameterizing the evolvability of a bacterial strain. %TOC% ---++ Requirements and Installation The current version of the md package is available from our public Mercurial repository: | [[http://barricklab.org/hg/md/archive/tip.zip][%ICON{download}% Download Current Source Archive]] | | [[http://barricklab.orgu/hg/md/][%ICON{info}% Browse Mercurial Repository]] | The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from [[http://www.cpan.org][CPAN]]. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows: <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >sudo perl -MCPAN -e shell<br> >Password: ********<br> >install Math::Random::MT::Auto<br> <i>answer yes to any prompts about installing prerequisites</i> </div></code> [[http://www.mathworks.com/][MATLAB]] is required for calculating establishment probabilities. [[http://www.r-project.org/][R]] is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package [[http://cran.r-project.org/web/packages/nortest/index.html][nortest]] installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command: <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >install.packages("nortest") </div></code> You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at: <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> #!/usr/bin/perl </div></code> Help for each individual Perl script can be obtained as follows <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >perldoc <script> </div></code> ---++ 1. Fit α and τ empirical parameters from experimental data The basic command is: <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf </div></code> =output.fit= is a tab-delimited file containing information about each curve fit. =output.fit.pdf= shows plots of each fit, so that you can judge whether they accurately reflect the data. ---+++ Input file data format The input file is *tab-delimited*. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the =-m= option followed by =ratio=, =fraction=, =log_ratio=, or =log10_ratio= to this script depending on the format of you data values. The default mode is =ratio=. *Portion of an example marker ratio input file:* <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> transfer exp-1 exp-2 exp-3 <br> 0 0.5087 0.5068 0.4990<br> 3 0.5000 0.4844 0.5174<br> 6 0.4853 0.5393 0.5115<br> 9 0.4802 0.4862 0.4522<br> 12 0.4884 0.4431 0.5170<br> 15 0.5277 0.5196 0.5266<br> 18 0.4983 0.4638 0.4607<br> 21 0.5221 0.5361 0.5000<br> </div></code> The output file is tab-delimited, with columns containing data as labeled. ---+++ Baseline correction If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the =-b= option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state _A_, where _A_ was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population. *Example:* <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf </div></code> _Corrects for the baseline by taking the average of the first 5 points._ ---++ 2. Generate a table of establishment probabilities with MATLAB. The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script. First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path. *In MATLAB:* <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab') </div></code> The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [[#ReferenceAnchor][[2]]]. It is important to allow a maximum selection coefficient value several fold *greater* than the expected effective selection coefficient (_s_) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum. A file (=pr_establishment_T_6.64_N_5E6_LT.tab=) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term _E. coli_ evolution experiment. ---++ 3. Simulate and fit idealized marker divergence curves The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient _s_. These beneficial mutations occur with a rate _μ_. Selection coefficients are defined such that w<sub>new</sub>=w<sub>initial</sub> (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment. For each combination of _s_ and _μ_, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of _s_ and _μ_ that best explain the experimental data to be determined. ---+++ 3.1 Simulate marker divergence data for a set of _μ_ and _s_ effective beneficial mutation parameters <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab </div></code> The generations per transfer (=-T=), initial population size at the beginning of each growth cycle (=-N=), per generation mutations rate (=-u=), per generation selection coefficient (=-s=), file of establishment probabilities produced by MATLAB (=-p=), number of generations during outgrowth (before printing any data) (=-k=), print out marker ratio each time this many transfers pass (=-i=), number of simulation replicates to perform (=-r=). ---+++ 3.2 Fit α and τ empirical parameters from simulated curves This step is the same as that used to fit the experimental data. <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit </div></code> ---+++ 3.3 Automating and parallelizing this step Generally, many combinations of _μ_ and _s_ must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters. <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1 -r 100 -k 22 </div></code> Parameters are the same as in =marker_divergence_pop_gen_simulation.pl=, except =-u= and =-s= are supplied as =start:end:step_size= combinations, and =-u= is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10<sup>-8</sup>. This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in =.fit=) . ---++ 4. Determine the maximum likelihood effective parameters Finally, we determine what values of _μ_ and _s_ produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data. <code><div style="border-color: grey; border-style: solid; border-width: 1px; padding:1px;"> >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf </div></code> The output file =experimental.sig= has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a _greater than_ 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is _all blue_, then none of your simulated data agreed with the experimental data.) Note that you can now pass the =-n= flag, and the program will use a 2-dimensional version of the KS-test to determine the 95% confidence contour. ---++ References #ReferenceAnchor 1 Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. _Science_ *311*, 1615-1617. 1 Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. _Evolution_ *55*, 2606-2610. ---++ Acknowledgments Many thanks to *Noam Shoresh* for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r14
|
r12
<
r11
<
r10
<
r9
|
B
acklinks
|
V
iew topic
|
More topic actions...
Barrick Lab
>
ToolList
>
ToolsMarkerDivergence
Contributors to this topic
JeffreyBarrick
Topic revision: r10 - 2010-09-28 - 14:00:25 - Main.JeffreyBarrick
Barrick Lab
Contact
Research
Publications
Team
Protocols
Reference
Software
UT Austin
Mol Biosciences
ILS
Microbiology
EEB
CSSB
CBRS
The LTEE
iGEM team
SynBioCyc
SynBio course
NGS course
BEACON
Search
Log in
Copyright ©2025 Barrick Lab contributing authors. Ideas, requests, problems?
Send feedback