VCFtools indels/SNP treatment
1
0
Entering edit mode
15 months ago
draccident • 0

Hello Stars,

Does anyone happen to know what Vcftools v0.1.17 default treatment for "--remove-indels" is when there is a SNP positions that overlaps with it? Presumably, the "SNP" is not removed because it is not labeled as a indel...

If so, what are ways to ensure those "SNPs" are removed...

variants vcftools • 989 views
ADD COMMENT
0
Entering edit mode

I think you would really have to define your expected behavior explicitly.

What does it mean for an insertion and a substitution to "overlap"?

what if it the substitution overlaps with the indel in the same sample - one allele for each?

ADD REPLY
0
Entering edit mode

Say that you have INDEL starting at position 15 AGTAGTCATACATCAT

At position 19 you have a SNP G T

and at position 25 you have a SNP C G

Are those SNPs retained?

ADD REPLY
2
Entering edit mode
15 months ago
##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FILTER=<ID=q10,Description="Quality below 10">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  A   B
1   15  .   AGTAGTCATACATCAT    A   1806    q10 DP=35   GT:GQ:DP    1/1:409:35  1/1:409:35
1   19  .   G   T   1792    PASS    DP=32   GT:GQ:DP    0/0:245:32  0/0:245:32
1   25  .   C   G   628 q10 DP=21   GT:GQ:DP    0/1:245:32  0/1:245:32

run vcftools

vcftools --vcf subset.vcf --remove-indels --out SNPs_only --recode

cat SNPs_only.recode.vcf

##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FILTER=<ID=q10,Description="Quality below 10">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  A       B
1       19      .       G       T       1792    PASS    .       GT:GQ:DP        0/0:99:32       0/0:99:32
1       25      .       C       G       628     q10     .       GT:GQ:DP        0/1:99:32       0/1:99:32

now try with an individual exhibiting both alleles. can it decompose these before the filter?

##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FILTER=<ID=q10,Description="Quality below 10">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  A   B
1   15  .   A   G,AG    1806    q10 DP=35   GT:GQ:DP    0/1:409:35  1/1:409:35
1   25  .   C   G   628 q10 DP=21   GT:GQ:DP    0/1:245:32  0/1:245:32

nope

##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FILTER=<ID=q10,Description="Quality below 10">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  A       B
1       25      .       C       G       628     q10     .       GT:GQ:DP        0/1:99:32       0/1:99:32
ADD COMMENT

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6