Question: VCF deletions incorrectly formatted
0
gravatar for john.michel.rouhana
20 days ago by
john.michel.rouhana10 wrote:

Hi all,

I'm working with a vcf (v4.1) that has incorrectly formatted deletions for some reason. The insertions are fine, but the deletions are annotated as (example):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
2   32474671    indel.60227 A   -   .   PASS    .   GT

Notice that the ALT is -, when the line should have been formatted as such (example):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
2   32474670    indel.60227 GA  G   .   PASS    .   GT

I have no idea how the deletions ended up like this in the vcf, but my present plan is to parse a reference genome fasta file for these positions and manually correct all the deletion annotations, so I don't have to drop them from the vcf. What I wanted to know is if there's a tool that already does this- as it stands, I'm writing a manual parser.

qc reference panel vcf • 94 views
ADD COMMENTlink modified 20 days ago • written 20 days ago by john.michel.rouhana10
1

It is quite odd that insertions are fine and deletions are not. Older VCF versions (4.1) had . for REF in insertions and . for ALT in deletions, so either both should be affected or neither should be.

Maybe give this tool a shot? Disclaimer: This tool is not mine and I have never used it. Maybe bcftools norm --check-ref can fix the REF alleles, I'm not sure though.

ADD REPLYlink modified 20 days ago • written 20 days ago by RamRS22k

I definitely agree that it's odd. I'm having trouble finding older versions that used a - as ALT in deletions, so I'm not sure it's ever the case. A big part of this problem is that I can't figure out how the people who supplied the VCF ended up in this situation.

bcftools doesn't seem to fix the problem, probably because the REF alleles are fine; it's the ALT that are botched.

Looking into the other tool that you linked. Hopefully it helps.

ADD REPLYlink written 20 days ago by john.michel.rouhana10
1

I don't know of any tool that uses - - older versions used ., not -.

ADD REPLYlink written 20 days ago by RamRS22k
2
gravatar for john.michel.rouhana
20 days ago by
john.michel.rouhana10 wrote:

I wound up just parsing the vcf in Python and calling samtools to correct the deletions in the vcf. By subtracting 1 from the alleged bp_pos, calling samtools faidx <reference_genome> chr:del_start-del_end for each REF, and taking the first character of that for ALT, you can fill in the blanks. I don't believe there's a tool that corrects this problem in VCFs, because I don't think this is a common (or normal) problem. Leaving this here in case anyone ever encounters the same situation.

ADD COMMENTlink written 20 days ago by john.michel.rouhana10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2017 users visited in the last hour