Linkage information from unphased VCF files
2
1
Entering edit mode
7.6 years ago
Rubal ▴ 340

Hello Everyone,

I am looking for the best way to get linkage information from unphased whole genome population data. I have a vcf file with multiple individuals from different populations. The data is unphased but I would like to detect regions with an excess of linkage disequilibrium as a measure of positive selection. I have not phased the data because I have a limited number of individuals per population of a non-model species and therefore worry that phasing will be very inaccurate.

What do people think would be the best way to detect regions with high levels of linkage disequilibrium? I was thinking something like VCFtools --geno-r2 option might be suitable.

Thanks for your help!

Best regards,
Rubal

next-gen vcf genome • 4.5k views
ADD COMMENT
3
Entering edit mode
7.6 years ago

The --geno-r2 option in vcftools should be enough for your needs; however, you can not calculate linkage disequilibrium if your data is not phased. If you were studying human individuals, I would suggest you to impute more genotypes by merging it with the 1000 Genomes data, but if you say that you are working with a non-model organism, this is not an option. Is there a close model organism that you can use to impute data?

ADD COMMENT
3
Entering edit mode
7.6 years ago

Giovanni M Dall'Olio is correct, it is advisable to phase the variants. In the off chance that you are interested in haplotype decay I have a tool that takes an un-phased VCF files, phases it and then calculates XP-EHH. I'm also working on a version for LD.

https://github.com/jewmanchue/vcflib/wiki/Haplotype-Decay

ADD COMMENT
0
Entering edit mode

That sounds like a promising tool I will give it a go. It mentions that it will give slightly different results each time due to the stochastic search. Would you recommend a multiple iterations approach? Also is there an option for specifying window sizes, or would you do post-hoc averaging of scores across sites for windows? Thanks very much

ADD REPLY
0
Entering edit mode

Running it several times will allow you to generate a confidence interval around the XP-EHH score. Window size is determined by the number of SNPs required for EHH to decay to 0.05 and isn't specified by the user.

ADD REPLY

Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6