Need help teasing out insertions and deletions with bcftools stats file.vcf > file.stats
1
0
Entering edit mode
6.1 years ago
oars ▴ 200

The following is a neat feature found in bcftools...

bcftools stats file.vcf > file.stats

...however, it doesn't seem to differentiate between insertions or deletions - just indels?

Here is an example of the output:

# SN    [2]id   [3]key  [4]value
SN  0   number of samples:  1
SN  0   number of records:  1761
SN  0   number of no-ALTs:  0
SN  0   number of SNPs: 1663
SN  0   number of MNPs: 0
SN  0   number of indels:   98
SN  0   number of others:   0
SN  0   number of multiallelic sites:   2
SN  0   number of multiallelic SNP sites:   0
# TSTV, transitions/transversions:
# TSTV  [2]id   [3]ts   [4]tv   [5]ts/tv    [6]ts (1st ALT) [7]tv (1st ALT) [8]ts/tv (1st ALT)
TSTV    0   1267    396 3.20    1267    396 3.20

Is there a way to separate the insertions and deletions using bcftools?

Pie in the sky would be a stats read out option that would also provide information about heterozygous genotypes and dbSNP sites.

bcftools indel vcf insertion deletion • 3.4k views
ADD COMMENT
2
Entering edit mode

is there a SVTYPE tag in the INFO column ?

ADD REPLY
0
Entering edit mode

I don't think so, I see the following...

AC=1;AF=0.500;AN=2;BaseQRankSum=-1.483;ClippingRankSum=0.000;DP=75;ExcessHet=3.0103;FS=2.014;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=14.48;ReadPosRankSum=1.552;SOR=1.047   GT:AD:DP:GQ:PL  0/1:40,34:74:99:1100,0,1414
ADD REPLY
0
Entering edit mode

I've also tried vcftools with its vcf-stats feature:

vcf-stats file.vcf

This also provides an indel count but does not separate insertions and deletions, it does however provide a confusing list following the indel count. I'm not sure what it represents and its not clear from the manual pages.

However, this simple command line vcftools script from matt (Count Of Variants) seems to do the trick:

$ zcat SRR1611183.gatk.vcf.gz | vcf-annotate --fill-type | grep -oP "TYPE=\w+" | sort | uniq -c
     56 TYPE=del
     42 TYPE=ins
   1663 TYPE=snp
ADD REPLY
0
Entering edit mode
13 months ago

Not in one step that I could find, but this works:

deletions:

bcftools view --types indels --include 'ILEN<0' file.vcf | bcftools stats

insertions:

bcftools view --types indels --include 'ILEN>0' file.vcf | bcftools stats

relevant part of the documentation - https://samtools.github.io/bcftools/bcftools.html#expressions

  • variables calculated on the fly if not present: number of alternate alleles; number of samples; count of alternate alleles; minor allele count (similar to AC but is always smaller than 0.5); frequency of alternate alleles (AF=AC/AN); frequency of minor alleles (MAF=MAC/AN); number of alleles in called genotypes; number of samples with missing genotype; fraction of samples with missing genotype; indel length (deletions negative, insertions positive)

N_ALT, N_SAMPLES, AC, MAC, AF, MAF, AN, N_MISSING, F_MISSING, ILEN

ADD COMMENT

Login before adding your answer.

Traffic: 2238 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6