How to create a Venn Diagram for overlapping SVs from a merged VCF
0
3
Entering edit mode
4 months ago

Hi there,

I have a probably trivial question but, as I couldn't find any relatable/useful solution online, I wonder if someone could help. Basically, I need to plot a Venn Diagram for the intersection of three VCF files done using truvari.

truvari is a structural variants evaluation tool for benchmarking a call set, generated using different approaches, against a truth. In my case, I'm working with HG002 and three Papuan individuals for which I produced call sets using different approaches: manta short reads-linear reference, and two pangenome approaches that use PanGenie for the downstream genome inference.

Said that, what would be interesting is to know what are the SVs in common between callers, but most importantly where the graph approaches do better than the linear reference for all samples — which I already assessed in terms of metrics.
So, with truvari collapse I produced my merged.vcf for each sample according to the three different call sets; however, the question is how I get the VCF in a format so that it can be used in R (possibly) to generate a Venn Diagram?

In other words, what I have to do or what information I need to extract from the VCF to produce the plot I want to show in R? If someone has already done something similar, any help is appreciated. Thanks in advance!

truvari structural-variants r vcf venn-diagram • 784 views
ADD COMMENT
1
Entering edit mode

i think we'd have to see a snippet of this merged.vcf to know how to proceed

ADD REPLY
0
Entering edit mode

@Jeremy Leipzig. Indeed, that was a lack on my side sorry. Here is a view at how the merged file looks likemerged.vcf-snippet

ADD REPLY
1
Entering edit mode

ok it's not clear to me this file describes the origin of the calls. can you generate intersections with the origin vcfs using bedtools and then generate a venn diagram using intervene?

ADD REPLY
1
Entering edit mode

Hi again @Jeremy Leipzig. I moved on and did what you recommended me; basically, I run the following

intervene venn -i Documents/vennDiagram/6103671_diploidSV.vcf.gz Documents/vennDiagram/pap_6103671-biallelic.vcf.gz Documents/vennDiagram/6103671-biallelic.vcf.gz -o Documents/vennDiagram/ --names=linear,graph,personalized --title three-way-vennDiagram --figtype svg --figsize 12 12 --bedtools-options header

mainly because I realized that, for some reason, the file I generated with bcftools intersect was not processed correctly by intervene. Maybe you could point me out the exact command, as I might have done something wrong. VCF are processed correctly as long as I add the --bedtools-options header. Other than that below is the image of the venn Diagram showing the intersect for the SVs detected for samples 6103671 using the linear reference (6103671_diploidSV.vcf.gz), a pangenome graph (pap_6103671-biallelic.vcf.gz) and another version for the pangenome (6103671-biallelic.vcf.gz). Let me know whether this makes sense, thanks again! venn Diagram

ADD REPLY
1
Entering edit mode

ok sure that looks believable. you might like the UpSet plot better simply because it will maintain the proportions.

ADD REPLY
1
Entering edit mode

Thanks again @Jeremy Leipzig this saved me a lot of trouble mainly to figure out what-is-what (or needed) for Venn Diagrams in R starting from the intersection of VCF files. I will try also the intervene upset, it seems a pretty useful tool overall!

ADD REPLY

Login before adding your answer.

Traffic: 1158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6