Apply PASS to vcf Indel file
1
1
Entering edit mode
6.6 years ago
hdtms ▴ 20

In the terminal I ran:

java -jar ../Programs/GenomeAnalysisTK.jar -R ../Programs/human_g1k_v37_decoy.fasta -T SelectVariants --variant Analysis/1.GATK_filtering/1.Only_Indels/vcf_Indels.vcf -select "FILTER == 'PASS'" -o Analysis/1.GATK_filtering/2.Only_Pass/vcf_Indels-Passes.vcf

Which gave me a file with this number of lines

204 vcf_Indels-Passes.vcf

I found this number very low, so I tried doing:

java -jar ../Programs/GenomeAnalysisTK.jar -R ../Programs/human_g1k_v37_decoy.fasta -T SelectVariants --variant Analysis/1.GATK_filtering/1.Only_Indels/vcf_Indels.vcf -select 'vc.isNotFiltered()' -o Analysis/1.GATK_filtering/3.Only_Pass/vcf_Indels-Passes.vcf

Which gave me a file with this number of lines

29531 vcf_Indels-Passes.vcf

I don't know which of these filters is correct or if they're both wrong, next I will be applying statistical test to the resulting PASS file.

SNP • 1.8k views
ADD COMMENT
1
Entering edit mode
6.6 years ago

The first line of code will only extract InDels with a 'PASS' in the VCF FILTER column.

The second line of code may not be functioning as expected. It may be outputting all InDels that either have a PASS or that have no value in the FILTER column. The LowQual ones may still be filtered out. You just have to check the output to see what's happening.

I tend to do my own VCF filtering using awk or Python because, in many situations, the standard tools don't behave as we would expect. This may be due to the fact that the VCF format suffers from a lack of standardisation, generally, i.e., there is a specified format but many variations of it.

ADD COMMENT

Login before adding your answer.

Traffic: 1431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6