Question: Merging public vcf files: <NON_REF> in ALT
0
gravatar for jimkozubek
10 months ago by
jimkozubek20
jimkozubek20 wrote:

I am working with publicly available data sets of VCF files. I accessed VCF files broken out by patient and by chromosome with just the 0/0 calls, and unfortunately the ALT column includes a value <non_ref> on every line. I also have VCF files per patient with 1/1 and 0/1 calls across the entire genome, those do have values in the ALT column, such as A, G, or CATGTT, for instance.

I merged all files by patient, but then when I try to use bcftools merge across patients, the problem is the single merged vcf file (with 5000 patients) now thinks of <non_ref> literally as one of the potential ALT calls.

Sadly, I cannot go back upstream in this public data set and re-run these files with GATK.

I am wondering if anyone has any ideas on how to get vcftools, bcftools or gatk vcf merge functions to ignore the <non_ref> value in the ALT column on some lines in each file?

P.S. I tried a recode the files manually with perl -pe "s/<non_ref>/./g" but bcftools is throwing flags, as in missing value in ALT column.

Jim

bcf gatk vcf • 491 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by jimkozubek20

MERGING VCF FILES

There is no need to SHOUT. I have adapted your title and simultaneously made it more specific.

ADD REPLYlink modified 10 months ago • written 10 months ago by WouterDeCoster38k
1
gravatar for jimkozubek
10 months ago by
jimkozubek20
jimkozubek20 wrote:

I will answer my own question:

I started with single sample VCF files and some had lines with 0/0 calls with alt designated by NON_REF

"bcftools merge" created problems for me because it treated NON_REF as a literal ALT allele, which sometimes made other 0/1 calls turn into 0/2 calls.

However, when I reverted to the older software vcftools and ran: "vcf-merge" it always put NON_REF at the end of the list of potential ALT alleles, and did not cause and problems.

So, bcftools thinks of NON_REF as a literal ALT allele call, but vcftools does not seem to think that NON_REF is a literal allele

--and it works.

ADD COMMENTlink written 10 months ago by jimkozubek20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 744 users visited in the last hour