Question: SNPs not filtered from a GATK vcf file
gravatar for evelyn
13 months ago by
evelyn130 wrote:

Hi All,

I have a vcf file with SNP information from multiple samples made using GATK:

gatk --java-options "-Xmx4G" HaplotypeCaller -R ref.fa -I bams.list -L ch01 -O 01.vcf

Individual vcf's were made chromosome wise and then concatenated:

bcftools concat -o merge.vcf 01.vcf 02.vcf 03.vcf 04.vcf 05.vcf

I want to keep the SNPs only for the final vcf file so I did:

bcftools filter -i 'TYPE="snp"' merge.vcf > merge_SNP.vcf

But the output file still has INDELS. Then I tried using bcftools view for the same job:

bcftools view -v snps merge.vcf > merge_SNP.vcf

The output file again has variants other than SNPs. I am not sure what is going wrong. I will appreciate any suggestions. Thank you!

snp • 327 views
ADD COMMENTlink modified 13 months ago by inedraylig20 • written 13 months ago by evelyn130

You can also use SelectVariants module from GATK

ADD REPLYlink written 4 months ago by svp280
gravatar for inedraylig
13 months ago by
University of Vienna
inedraylig20 wrote:

The recommended way to filter indels would be to use --exclude types:

bcftools view --exclude-types indels merge.vcf > merge_SNP.vcf

assuming that you only have SNPs and INDELs in your vcf file. bcftools filters using the INFO field, so you can look at your vcf and see where the identity of the call (SNP/INDEL) appears and if it's indeed in the INFO field.

ADD COMMENTlink written 13 months ago by inedraylig20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour