query PL field with bcftools
2
1
Entering edit mode
8.6 years ago
Yahan ▴ 400

Hi BioStars,

I'd like to find the highest PL value (posterior likelihood for each possible genotype) for each sample in my VCf file.

From the documentation, I know that you can do MAX(DP) which will find you the highest genotype depth over all samples

Since the PL field has a value for each possible genotype (e.g. 0/0, 0/1 and 1/1 for a biallelic variant, so 3 PL values), this mechanism is not applicable here.

If for instance

PL=0.233,0,0.767

Then I want to find 0.767

Is this possible with bcftools v1.2?

Thanks for the input.

SNP bcftools • 2.7k views
ADD COMMENT
0
Entering edit mode
2.1 years ago

Quite old thread but can be useful for others. You can use SNPSift for this, with this great app, you can access the three PL values by index, for instance GEN[0].PL[0] will be the first PL score for the first sample

ADD COMMENT
0
Entering edit mode
2.1 years ago
bcftools query -f '[%SAMPLE %PL\n]' in.vcf.gz | awk '{N=split($2,a,/[,]/);for(i=1;i<=N;i++) {n=int(a[i]);if(hash[$1]<n){hash[$1]=n;}}} END {for(S in hash) {print S,hash[S]}}'
ADD COMMENT

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6