Question: Pileup viewer for VCF v4.2
0
gravatar for Lee Katz
4.4 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

Hey BioStars, sorry to flood you so much with my bcftools questions.  Since I have merged my VCFs, it turns out that they are now in VCF version 4.2 format.  I want to view the pooled VCF; however, IGV does not read v4.2.  Is there a mechanism for viewing v4.2?  In any pileup viewer?

Using vcf-convert to version 4.1 does not work because after I convert/bgzip/index to v4.1, I get an error

Error loading C:\Users\gzu2\Desktop\msa\out.4.1.vcf.gz: The provided VCF file is malformed at approximately line number 31: Duplicate allele added to VariantContext: G     

Lines 30-32 are such:

#CHROM  POS ID  REF ALT QUAL  FILTER  INFO  FORMAT  lambda_virus.fasta.wgsim.fastq.gz-reference sample1.fastq.gz-reference  sample2.fastq.gz-reference  sample3.fastq.gz-reference  sample4.fastq.gz-reference
NC001416  1 . G G 0 PASS  AC=0;ADP=0;AN=0;HET=0;HOM=0;NC=1;WT=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AC ./. ./.:.:0:2:.:0:.:.:.:.:.:.:.:.:0 ./.:.:1:1:.:0:.:.:.:.:.:.:.:.:0 ./.:.:1:3:.:0:.:.:.:.:.:.:.:.:0 ./.:.:2:2:.:0:.:.:.:.:.:.:.:.:0
NC001416  2 . G G 0 PASS  AC=0;ADP=3;AN=0;HET=0;HOM=0;NC=1;WT=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AC ./. ./.:.:3:4:.:0:.:.:.:.:.:.:.:.:0 ./.:.:1:1:.:0:.:.:.:.:.:.:.:.:0 ./.:.:2:3:.:0:.:.:.:.:.:.:.:.:0 ./.:.:2:3:.:0:.:.:.:.:.:.:.:.:0

 

igv vcf4.2 bcftools-merge vcf • 2.3k views
ADD COMMENTlink modified 4.3 years ago by Biostar ♦♦ 20 • written 4.4 years ago by Lee Katz2.9k
1

Why don't you just filter out the rows where REF=ALT? That may well get rid of the problem.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

Good idea!  But now I am getting the same error pertaining to this line:

NC001416  186 . G C,G 0 PASS  AC=2,0;ADP=201;AN=10;HET=0;HOM=1;NC=0;WT=0  GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AC 1/1:255:201:204:4:197:98.01%:1.7394E-112:17:17:4:0:179:18:197,. 0/0:48:26:44:25:0:0%:1E0:24:0:11:14:0:0:.,0 0/0:38:27:39:24:1:4%:5E-1:26:29:11:13:1:0:.,1 0/0:33:19:41:17:0:0%:1E0:26:0:8:9:0:0:.,0 0/0:40:23:36:22:0:0%:1E0:30:0:8:14:0:0:.,0

Most or all of my lines will have at least two choices in the ALT field because it is a pooled VCF from bcftools merge.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Lee Katz2.9k

So it complains whenever any of the ALT alleles = REF. That's pretty reasonable in my mind (if VCF4.2 allows that then I'd be curious what the reasoning was...it seems like a terrible idea). I'd be curious if the GT field ever uses the ALT=REF genotype (e.g., 0/2 for the line you showed). I would suspect not. In any case, removing examples of this shouldn't be too terrible to script.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

If I remove any line with REF and ALT with something like G C,G, then it will remove every line. This is because it is a merged VCF file. So I guess a good question would be, can I view a merged VCF in any viewer?

ADD REPLYlink written 4.4 years ago by Lee Katz2.9k

I wouldn't remove the line, just remove the reference ALT allele. You'll need to ensure that it's the last ALT allele (otherwise, you'll change the sample genotypes).

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

I'm still learning the VCF format but I'll give that a try.  Is this the kind of format you are saying to make it into?  I would change C,G to C,.?  Or to .,C?  Or remove it altogether so that it is just C?

NC001416  186 . G C,. 0 PASS
ADD REPLYlink written 4.4 years ago by Lee Katz2.9k

I changed all the ALT fields so that they do not have commas and so that they are only a single nucleotide.  I also made sure that the ALT nucleotide was different than the REF.  And now I can import it to IGV!  Woo!

However, all variants now are incorrectly attributed to the first genome.  I think I see what you meant by having to change the genotype field which might be too much effort for what I want... is there any script out there to do what I want?  Or is it easier than I think?

And the code, for posterity:

gunzip -c out.4.1.vcf.gz | perl -lane 'if(/^#/){print;next;} $REF=$F[3]; @ALT=split(/,/,$F[4]); for(@ALT){$F[4]=$ALT[0] if($F[4] ne $REF);} next if($F[3] eq $F[4]); print join("\t",@F);' > tmp.vcf

bgzip -f tmp.vcf && tabix tmp.vcf.gz
ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Lee Katz2.9k

I don't know of a script/program to do it (I don't have time at the moment to quickly write it, or else I'd just do it). If you happen to end up writing one then please share it here. I imagine others would find it useful.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

Fair enough!  I am not sure when I'll be able to do it, but if anyone else does it before me though please post it here!

ADD REPLYlink written 4.4 years ago by Lee Katz2.9k

I think I script I wrote today should finally answer this question (and more!).  I call it fix vcf and it is part of my Lyve-SET package.  https://github.com/lskatz/lyve-SET/blob/master/scripts/set_fixVcf.pl

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Lee Katz2.9k

Maybe a better question is, can I load a pooled/merged VCF into any pileup viewer?  What about VCF v4.2 files?

ADD REPLYlink written 4.4 years ago by Lee Katz2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1720 users visited in the last hour