How to filter VCF where at least x% of the individuals have DP>=10 and GQ>=20
1
0
Entering edit mode
4.4 years ago
hellbio ▴ 490

Hi,

I would like filter the vcf file using DP and GQ thresholds at sites where atleast 80% of the individuals meeting the thresholds. More precisely, i have the below two scenarios:

  1. Retain sites where atleast 80% of the individuals had at least depth DP >= 10 and GQ>=20 irrespective of the reference or non-reference allele.

  2. Retain sites where atleast one sample has the non-reference allele with DP>= 10 and GQ >= 20.

I checked the vcftools documentation but could not find where i could specify the minimum number of individuals. I believe there could an existing thread or solution to acheive this. Could someone refer the solution here.

vcf filter gatk • 2.0k views
ADD COMMENT
1
Entering edit mode
4.4 years ago

using vcffilterjdk : http://lindenb.github.io/jvarkit/VcfFilterJdk.html

Retain sites where atleast 80% of the individuals had at least depth DP >= 10 and GQ>=20 irrespective of the reference or non-reference allele

 java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().filter(G->G.getDP()>=10 && G.getGQ()>=20).count()/(double)variant.getNSamples() > 0.8;' input.vcf

Retain sites where atleast one sample has the non-reference allele with DP>= 10 and GQ >= 20.

$ java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().anyMatch(G->G.getDP()>=10 && G.getGQ()>=20 && G.getAlleles().stream().anyMatch(A->A.isCalled() && !A.isReference())) ;'
ADD COMMENT

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6