Questions on tagging low qual variants and DP filtering on a joint VCF (generated by GATK GenotypeGVCFs)
1
0
Entering edit mode
9.2 years ago
caok • 0

Hi Folks,

I got two questions according to joint VCF (multiple samples) need your help.

  1. I need to flag SNPs and Indels in the "FILTER" column (PASS or Low_confidence) in a joint VCF generate by GenotypeGVCFs. Basically, we call a family (trio) together so a typical joint VCF contains calls from child and parents. I followed the rules that proposed by GATK: http://gatkforums.broadinstitute.org/discussion/2806/howto-apply-hard-filters-to-a-call-set

    Then I noticed that in the joint VCF, the INFO field is generated basically based on all samples in the VCF. However, I just want to tag the "FILTER" column based on Child (Child column). How can I apply the GATK SelectVariants on this joint vcf and use the information from Child only? Or any other tools would help?

  2. I also want to filter the same joint VCF by DP in the child column, how can I do it with GATK or any other tools? SelectVariants seems to extract DP in the "INFO" field, which is a DP sum of from all samples that have been joint. Any suggestions?

Thank you very much!

-Linda

next-gen • 5.2k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Len Trigg ★ 1.6k

I'm not sure exactly what you mean by your first question (do you mean annotate with the variant type based on the genotype of the Child?)

The second one is simple enough if you use Real Time Genomics tools:

rtg vcffilter -i input-variants.vcf.gz -o output-variants.vcf.gz --min-read-depth=NN --sample=Child --fail=CHILD-LOW-DP
ADD COMMENT
0
Entering edit mode

Thank you for your answer. The first question is to put in "PASS" or "Low_confidence" in the "FILTER" column based on some filtering thresholds on QD and FS of SNPs and Indels. For single sample VCF, its easy, but for joint VCF, I want to put in the flag based on one sample in the joint VCF (for example child). But GATK VariantFilteriation is using the "INFO" column which in joint VCF, is a summary of all samples in the VCF.

I hope I made it clearly. Any suggestions?

ADD REPLY

Login before adding your answer.

Traffic: 2892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6