Question: Calculating statistically significant outlier for Pairwise Fst obtained from VCFTools
0
gravatar for Anurag
4.8 years ago by
Anurag20
Belgium
Anurag20 wrote:

Hi,

I calculated pairwise Fst using VCFTools:

vcftools --vcf input_data.vcf --weir-fst-pop population_1.txt --weir-fst-pop population_2.txt --out pop1_vs_pop2

what method I should use for statistical significance to determine the outlier region or loci under putative selection or differentiation.

Thanks in advance for help. 

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Anurag20
1
gravatar for Zev.Kronenberg
4.8 years ago by
United States
Zev.Kronenberg11k wrote:

Here are four suggestions:

1.  See if your Fst values fit a parametric distribution (or somewhat close).  Estimate the distributions parameters and then look up a probability.  Notice I did not say a p-value.

2.  Permute your genotypes and re-run Fst many times.  This would be considered an empirical p-value, or probability.

3.  Check out pFst.  pFst is a likelihood ratio test for allele frequency differences.  It gives you a true p-value based on a Chi-Sq lookup: https://github.com/jewmanchue/vcflib/wiki/Association-testing-with-GPAT

4. Check out Lositan.  I haver never used it, but it apparently provides significance values for Fst. http://popgen.net/soft/lositan/

 

ADD COMMENTlink written 4.8 years ago by Zev.Kronenberg11k

To follow up on Zev's suggestion 2, and if you still want to use vcftools, you can perform a permutation by permuting the individuals defined in the population_1.txt and population_2.txt files.

ADD REPLYlink written 4.8 years ago by Adam990

May you explain bit more how that can be done

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Anurag20
0
gravatar for Anurag
4.8 years ago by
Anurag20
Belgium
Anurag20 wrote:

Dear Zev,

Thanks for the Answer. I will try pFst. Lositan is not practical solution for me provided that I have more then 2 million variant positions. 

Regarding,

t,target     -- argument: a zero based comma separated list of target individuals corrisponding to VCF columns
INFO: required: b,background -- argument: a zero based comma separated list of background individuals corrisponding to VCF columns

 

If I understood correct, that target means the individuals that we want to include in our analysis and background means not.

May I know how you are modelling it using PL or GL values for error correction and P-value calculation.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Anurag20

I have run pFst on 30 million variants.  It took about 5 hours with one cpu.

ADD REPLYlink written 4.8 years ago by Zev.Kronenberg11k

Cool, I will try and let you know.

ADD REPLYlink written 4.8 years ago by Anurag20

The target group is compared to the background group.  The Allele frequencies from the target and background are estimated from the genotype likelihoods, not the genotype counts.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Zev.Kronenberg11k

May you explain how we can do it in population scenario and what will be effect if we have 12 sample/population, lets say population A have 1.....12 and population B has 13....24. If we consider 0...11 as target and 12....23 as background, will I get the same output if I use If we consider 12....23 as target and 0...11 as background.

I can try this myself but as you are creator of the tool, you may already have tested it.

Best,

Anurag

 

 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Anurag20

Let's say you have 10 individuals 5 target and 5 background: -t 0,1,2,3,4 -b 5,6,7,8,9

If you have large ranges you can use:

perl -e 'print join ",", (0..9)'

ADD REPLYlink written 4.8 years ago by Zev.Kronenberg11k

Would you be able to post an example of your input and command line? I am trying to run pFst but am getting the error: more sample fields than samples listed in header

ADD REPLYlink written 3.3 years ago by ald47860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour