![]() | Warning! This tool is under active development. Its capabilities and usage may change without warning, and it has not been adequately tested. |
>sudo perl -MCPAN -e shell
>Password: ********
>install Math::Random::MT::Auto
answer yes to any prompts about installing prerequisites
MATLAB is required for calculating establishment probabilities.
R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it.
You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:
#!/usr/bin/perl
Help for each Perl script can be obtained
>marker_divergence_fit.pl -i input.tab > output.fit
-m
option followed by ratio
, log_ratio
, log10_ratio
to this script depending on the format of you data values. The default mode is ratio
.
Portion of an example marker ratio input file:
transfer exp-1 exp-2 exp-3
0 0.5087 0.5068 0.4990
3 0.5000 0.4844 0.5174
6 0.4853 0.5393 0.5115
9 0.4802 0.4862 0.4522
12 0.4884 0.4431 0.5170
15 0.5277 0.5196 0.5266
18 0.4983 0.4638 0.4607
21 0.5221 0.5361 0.5000
The output file is tab-delimited, with columns containing data as labeled.
-b
option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.
Example:
>marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit
Corrects for the baseline by taking the average of the first 5 points.
>>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')
The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].
It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.
A file (pr_establishment_T_6.64_N_5E6_LT.tab
) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.
>marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab
The generations per transfer (-T
), initial population size at the beginning of each growth cycle (-N
), per generation mutations rate (-u
), per generation selection coefficient (-s
), file of establishment probabilities produced by MATLAB (-p
), number of generations during outgrowth (before printing any data) (-k
), print out marker ratio each time this many transfers pass (-i
), number of simulation replicates to perform (-r
).
>marker_divergence_fit.pl -i
>marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1 -r 100 -k 22
Parameters are the same as in marker_divergence_pop_gen_simulation.pl
, except -u
and -s
are supplied as start:end:step_size
combinations, and -u
is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10-8.
This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step.
>marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig
The output file experimental.sig
has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.
Barrick Lab > ToolList > ToolsMarkerDivergence