Identify fixed differences between population in WGS data (VCF Format)
1
1
Entering edit mode
4.4 years ago

I have a large WGS dataset composed of ~350 individuals from 6 different populations and I want to figure out the fixed differences between each population (and the rest) for the sake of calculating the direction of selection: DoS = Dn/(Dn + Ds) - Pn/(Pn + Ps).

I can easily calculate the within-population polymorphic differences (Pn/Ps) by splitting the VCF based on population and running each through SnpEff separately, but it is the Dn/Ds ration I'm having trouble figuring out (fixed differences between a given population and the rest). Any ideas?

SNP genome next-gen • 1.5k views
ADD COMMENT
3
Entering edit mode
4.4 years ago
Brice Sarver ★ 3.8k

bcftools will do this with --private. See here. You'll need to specify the samples belonging to the population of interest.

ADD COMMENT
0
Entering edit mode

I have tried this (and thought it worked), but looking more closely at my output file it doesn't retain SNPs specific to the population (it retains all SNPs for the population, regardless if they appear in others). The script I am using is:

bcftools view -x all_no_outgroups.recode.vcf.gz --samples-file cluster_1.txt > cluster_1_private.vcf

Am I doing something wrong in terms of my script here? As many of the SNPs retained in the cluster_1_private.vcf are also in the other populations.

ADD REPLY
0
Entering edit mode

Nevermind, I was looking at the wrong VCF! It did work, cheers and thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6