GATK SelectVariants consider spanning/overlapping deletions as SNP
1
0
Entering edit mode
5 months ago
cocchi.e89 ▴ 170

Quick question, I splitted a multiallelic VCF file with bcftools:

bcftools norm -m -any <IN.vcf> -OV > <OUT.vcf>

and then divided SNP from INDEL with GATK SelectVariants:

gatk SelectVariants \
 -R <REFERENCE.fasta> \
 -V <OUT.vcf> \
 --select-type-to-include SNP \
 -O <OUT.SNP.vcf>

But I noticed that this SNP-only VCF includes spanning/overlapping deletions (* allele) as SNP. As example:

chr1    10443   .   C   *   54.40   VQSRTrancheSNP99.90to100.00 AC=1;AF=0.038;AN=26;ANN=T|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||1567|1||sequence_alteration|HGNC|HGNC:37102||||chr1:g.10443C>T,T|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||1426|1||sequence_alteration|HGNC|HGNC:37102|YES|||chr1:g.10443C>T,T|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||||||||||3961|-1||sequence_alteration|HGNC|HGNC:38034|YES|||chr1:g.10443C>T;BaseQRankSum=1.83;DP=336;ExcessHet=0.202;FS=3.31;InbreedingCoeff=0.4448;MLEAC=2;MLEAF=0.077;MQ=30.71;MQRankSum=0;PG=0,8,19;QD=9.07;ReadPosRankSum=0.842;SOR=0.105;VQSLOD=-7.763;culprit=MQ    GT:AD:DP:FT:GQ:PL:PP    0/0:27,0:27:PASS:32:0,24,360:0,32,379   0/0:20,0:20:lowGQ:8:0,0,161:0,8,180 0/0:29,0:29:lowGQ:8:0,0,654:0,8,673 0/0:24,0:24:lowGQ:8:0,0,458:0,8,477 0/0:12,0:12:PASS:35:0,27,405:0,35,424   1/0:2,2:6:PASS:55:136,65,63:118,55,64   0/0:22,0:22:PASS:59:0,51,765:0,59,784   0/0:43,0:43:lowGQ:8:0,0,653:0,8,672 0/0:42,0:42:lowGQ:8:0,0,810:0,8,829 0/0:32,0:32:lowGQ:8:0,0,410:0,8,429 0/0:36,0:36:PASS:38:0,30,846:0,38,865   0/0:28,0:28:PASS:59:0,51,765:0,59,784   0/0:15,0:15:lowGQ:8:0,0,265:0,8,284

I think this is incorrect, aren't those supposed to be DEL? Or am I wrong?

Thank you in advance for any help!

SelectVariants SNP gatk INDEL • 352 views
ADD COMMENT
2
Entering edit mode
5 months ago

I think this is incorrect, aren't those supposed to be DEL? Or am I wrong?

it's not an indel, it's IN an indel (!). it is a local haploid region with a variant (you removed the ALT allele with norm) but there should a variant with a large deletion upstream of "chr1 10443"

ADD COMMENT
0
Entering edit mode

Thanks so much. So can I consider a haploid region as SNP?

ADD REPLY
0
Entering edit mode

well, that variant chr1 10443 . C * is meaningless without the associated ALT. It should be discarded.

ADD REPLY

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6