Question: SNPs not filtered from a GATK vcf file
gravatar for evelyn
5 months ago by
evelyn90 wrote:

Hi All,

I have a vcf file with SNP information from multiple samples made using GATK:

gatk --java-options "-Xmx4G" HaplotypeCaller -R ref.fa -I bams.list -L ch01 -O 01.vcf

Individual vcf's were made chromosome wise and then concatenated:

bcftools concat -o merge.vcf 01.vcf 02.vcf 03.vcf 04.vcf 05.vcf

I want to keep the SNPs only for the final vcf file so I did:

bcftools filter -i 'TYPE="snp"' merge.vcf > merge_SNP.vcf

But the output file still has INDELS. Then I tried using bcftools view for the same job:

bcftools view -v snps merge.vcf > merge_SNP.vcf

The output file again has variants other than SNPs. I am not sure what is going wrong. I will appreciate any suggestions. Thank you!

snp • 149 views
ADD COMMENTlink modified 5 months ago by inedraylig20 • written 5 months ago by evelyn90
gravatar for inedraylig
5 months ago by
University of Vienna
inedraylig20 wrote:

The recommended way to filter indels would be to use --exclude types:

bcftools view --exclude-types indels merge.vcf > merge_SNP.vcf

assuming that you only have SNPs and INDELs in your vcf file. bcftools filters using the INFO field, so you can look at your vcf and see where the identity of the call (SNP/INDEL) appears and if it's indeed in the INFO field.

ADD COMMENTlink written 5 months ago by inedraylig20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour