Question

How to simulate a neutral set of SNPs with background selection using SFS_CODE?

0

Entering edit mode

15 months ago

Jimmy ▴ 30

I'm working on a selective sweep analysis. I've run the sweep software RAiSD already, which generates a μ statistic value which evaluates a μ statistic in windows across the entire genome. Now, I need a cutoff value for my statistic to evaluate at which level I can tentatively identify a genomic region as having undergone a sweep. I need to find this cutoff using a neutral simulation.

One thing with RAiSD is that it is sensitive to background selection. So in the original paper (https://www.nature.com/articles/s42003-018-0085-8), the authors said they generated a neutral set of SNPs under the conditions of background selection using SFS_CODE and they used this to determine a 5% FPR cutoff. This is the first time I ever run a simulation. I began looking at the documentation of SFS_CODE today (https://sfscode.sourceforge.net/SFS_CODE_doc.pdf) and I'm having trouble finding out how to run this scenario: namely, how do I run a neutral simulation with background selection?

sweep simulation background_selection RAiSD SFS_code • 593 views

ADD COMMENT • link 15 months ago by Jimmy ▴ 30

score 1 · Accepted Answer · 2023-01-23

I know how to do this now so I thought it would be best to revisit and answer my own question. As it happens, the authors of this study (https://doi.org/10.1093/gbe/evab209) do exactly this in their paper and have released all the scripts to reproduce their work, including simulating neutral evolution given background selection using SFS_CODE, on github here: https://github.com/Jimi92/Cerco-DMI-resistance

So reading the paper and these scripts should answer how to do this for anyone interested. But here's a quick example taken from the github (but I modified the output file name):

./sfs_code 1 2 -n 89 -t 0.0017 -P 1 -L 1 1000000 -Td 0 0.026 -TE 0.0047 -r 0.063 -W 1 200.0 0.0 1.0 -o output_file.txt

Not all of this is directly relevant to the background selection part (some of it specifies a specific demographic scenario, the mutation rate, number of samples, number of populations etc etc), and if you want to know the details of what all these flags mean, it's listed in the github (also in the documentation, but the github lays out the meaning of the specific flags used in this example very nicely).

But the most important part for simulating negative / background selection is the W flag, which specifies the distribution of selective effects. In this particular example the authors used -W 1 200.0 0.0 1.0. There are four values that come after the -W flag and the meaning of these terms is <type> <gamma> <p_pos> <p_neg>. To understand <type>, it's complicated and best to just consult the SFS_CODE documentation directly (on pg. 16 at the time I'm looking at it) https://sfscode.sourceforge.net/SFS_CODE_doc.pdf.

The <gamma> refers to the γ = 2 Ns value, representing the selection coefficient. And, to my understanding, <p_pos> and <p_neg> represent the percentage of new nonsynyonmous mutations which are advantageous, deleterious respectively. The documentation itself specifies: "Of course if you simply want a Γ-distribution of negative selection (assuming no positive selection), then you can simply set <p_pos> = 0" (pg. 17). For this reason, the above example I chosen from the authors scripts uses <p_pos> <p_neg> of 0 1.0.