bcftools stats: definition of indel type
0
1
Entering edit mode
23 months ago

Hi,

I'm trying to figure out how the INDEL type is defined in bcftools stats. With some help I found the bcf_get_variant_types function (https://github.com/samtools/htslib/blob/958e6fa708d1914bc46d9f8e9411987402468153/vcf.c#L4247), but it is C and I can't figure out when a variant is considered an indel.

I mostly wonder about when a variant is considered and indel vs structural variation, for which the length cutoff is usually at 50bp.

vcf bcftools • 1.3k views
ADD COMMENT
0
Entering edit mode

if the type hasn't been already defined, the type is set as an array of bytes using OR and this function: https://github.com/samtools/htslib/blob/958e6fa708d1914bc46d9f8e9411987402468153/vcf.c#L4162

ADD REPLY
0
Entering edit mode

Yes, but I cannot figure out in which cases the INDEL type is assigned:

    if ( *a && !*r )
    {
        if ( *a==']' || *a=='[' ) { var->type = VCF_BND; return; }
        while ( *a ) a++;
        var->n = (a-alt)-(r-ref); var->type = VCF_INDEL; return;
    }
    else if ( *r && !*a )
    {
        while ( *r ) r++;
        var->n = (a-alt)-(r-ref); var->type = VCF_INDEL; return;
    }
    else if ( !*r && !*a )
    {
        var->n = 0; var->type = VCF_REF; return;
    }
ADD REPLY
0
Entering edit mode

so looking quickly an indel is when this is not a symbolic allele (<BND>) and the length of REF and the current ALT are not the same.

ADD REPLY
0
Entering edit mode
// i and j are the position in the  alt and ref string

    if ( a[i]!=END_OF_STRING && r[j]==END_OF-STRING )
        {
            if ( a[i]==']' || a[i]=='[' ) { var->type = VCF_BND; return; }
            while ( a[i]!=END_OF_STRING ) i++;
            var->n = (i-alti)-(j-refj); var->type = VCF_INDEL; return;
        }
ADD REPLY
0
Entering edit mode

But it requires that there is a match between a part of the REF allele and a part of the ALT allele (https://github.com/samtools/htslib/blob/958e6fa708d1914bc46d9f8e9411987402468153/vcf.c#L4215) to become an INDEL, and if there is no match then an OTHER is assigned?

In the VCF that I'm looking at the REF allele for an insertion is an N, and similarly for the ALT allele for a deletion.

ADD REPLY

Login before adding your answer.

Traffic: 2415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6