Merging public vcf files: <NON_REF> in ALT
Entering edit mode
2.8 years ago
jimkozubek ▴ 30

I am working with publicly available data sets of VCF files. I accessed VCF files broken out by patient and by chromosome with just the 0/0 calls, and unfortunately the ALT column includes a value <non_ref> on every line. I also have VCF files per patient with 1/1 and 0/1 calls across the entire genome, those do have values in the ALT column, such as A, G, or CATGTT, for instance.

I merged all files by patient, but then when I try to use bcftools merge across patients, the problem is the single merged vcf file (with 5000 patients) now thinks of <non_ref> literally as one of the potential ALT calls.

Sadly, I cannot go back upstream in this public data set and re-run these files with GATK.

I am wondering if anyone has any ideas on how to get vcftools, bcftools or gatk vcf merge functions to ignore the <non_ref> value in the ALT column on some lines in each file?

P.S. I tried a recode the files manually with perl -pe "s/<non_ref>/./g" but bcftools is throwing flags, as in missing value in ALT column.


VCF BCF GATK • 1.7k views
Entering edit mode


There is no need to SHOUT. I have adapted your title and simultaneously made it more specific.

Entering edit mode
2.8 years ago
jimkozubek ▴ 30

I will answer my own question:

I started with single sample VCF files and some had lines with 0/0 calls with alt designated by NON_REF

"bcftools merge" created problems for me because it treated NON_REF as a literal ALT allele, which sometimes made other 0/1 calls turn into 0/2 calls.

However, when I reverted to the older software vcftools and ran: "vcf-merge" it always put NON_REF at the end of the list of potential ALT alleles, and did not cause and problems.

So, bcftools thinks of NON_REF as a literal ALT allele call, but vcftools does not seem to think that NON_REF is a literal allele

--and it works.


Login before adding your answer.

Traffic: 1561 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6