Merging public vcf files: <NON_REF> in ALT
2
0
Entering edit mode
5.8 years ago
jimkozubek ▴ 30

I am working with publicly available data sets of VCF files. I accessed VCF files broken out by patient and by chromosome with just the 0/0 calls, and unfortunately the ALT column includes a value <non_ref> on every line. I also have VCF files per patient with 1/1 and 0/1 calls across the entire genome, those do have values in the ALT column, such as A, G, or CATGTT, for instance.

I merged all files by patient, but then when I try to use bcftools merge across patients, the problem is the single merged vcf file (with 5000 patients) now thinks of <non_ref> literally as one of the potential ALT calls.

Sadly, I cannot go back upstream in this public data set and re-run these files with GATK.

I am wondering if anyone has any ideas on how to get vcftools, bcftools or gatk vcf merge functions to ignore the <non_ref> value in the ALT column on some lines in each file?

P.S. I tried a recode the files manually with perl -pe "s/<non_ref>/./g" but bcftools is throwing flags, as in missing value in ALT column.

Jim

VCF BCF GATK • 3.1k views
ADD COMMENT
0
Entering edit mode

MERGING VCF FILES

There is no need to SHOUT. I have adapted your title and simultaneously made it more specific.

ADD REPLY
1
Entering edit mode
5.8 years ago
jimkozubek ▴ 30

I will answer my own question:

I started with single sample VCF files and some had lines with 0/0 calls with alt designated by NON_REF

"bcftools merge" created problems for me because it treated NON_REF as a literal ALT allele, which sometimes made other 0/1 calls turn into 0/2 calls.

However, when I reverted to the older software vcftools and ran: "vcf-merge" it always put NON_REF at the end of the list of potential ALT alleles, and did not cause and problems.

So, bcftools thinks of NON_REF as a literal ALT allele call, but vcftools does not seem to think that NON_REF is a literal allele

--and it works.

ADD COMMENT

Login before adding your answer.

Traffic: 2144 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6