Question: Pindel2Vcf: Reference Allele Length Issues
1
gravatar for Allpowerde
5.8 years ago by
Allpowerde1.2k
Allpowerde1.2k wrote:

It seems that for some variants (predominantly RPLs) pindel2vcf records the wrong END-value (which subsequently GATK complains about "ERROR MESSAGE: BUG: GenomeLoc 9:17427298-17427314 has a size == 18 but the variation reference allele has length 17") e.g.

chr9  17427298        .       TAGATTTTTCAGCAATAC      AGATTTTTCAGCAATACA      .       PASS    END=17427314;HOMLEN=0;NTLEN=18;SVLEN=-18;SVTYPE=RPL     GT:AD   0/0:0,4

The variant starts at 17427298 and affect a stretch of 18nt (ref allele) so the end should be 17427298+18-1=17427315 not 17427314.

Why does this occur?

In the meantime: this python snippet will correct the END values

import sys
filename=sys.argv[1]
outfile=open(sys.argv[2],"w")

countall=0
count=0 

for i in open(filename):
    if i.find("#")>-1:
        outfile.write(i)
    else:
        countall+=1
        content=i.split("\t")
        start=int(content[1])
        end=int(i.split("END=")[1].split(";")[0])
        length=len(content[3])
        if (start+length-1!=end):
            count+=1
            #print "update length %i %i : %s" % (start+length-1, end, i)
            content[7]=content[7].replace("END="+str(end), "END="+str(start+length-1))
        outfile.write("\t".join(content))

print "%i updated out of %i" % (count, countall)
vcf pindel • 1.7k views
ADD COMMENTlink modified 5.2 years ago by Mitsuko.Korobkin0 • written 5.8 years ago by Allpowerde1.2k
0
gravatar for Mitsuko.Korobkin
5.2 years ago by
United States
Mitsuko.Korobkin0 wrote:

Excellent solution (though temporary).  Thanks!

ADD COMMENTlink written 5.2 years ago by Mitsuko.Korobkin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1696 users visited in the last hour