Question

How to create a Venn Diagram for overlapping SVs from a merged VCF

3

Entering edit mode

10 months ago

Matteo Ungaro ▴ 100

Hi there,

I have a probably trivial question but, as I couldn't find any relatable/useful solution online, I wonder if someone could help. Basically, I need to plot a Venn Diagram for the intersection of three VCF files done using truvari.

truvari is a structural variants evaluation tool for benchmarking a call set, generated using different approaches, against a truth. In my case, I'm working with HG002 and three Papuan individuals for which I produced call sets using different approaches: manta short reads-linear reference, and two pangenome approaches that use PanGenie for the downstream genome inference.

Said that, what would be interesting is to know what are the SVs in common between callers, but most importantly where the graph approaches do better than the linear reference for all samples — which I already assessed in terms of metrics.
So, with truvari collapse I produced my merged.vcf for each sample according to the three different call sets; however, the question is how I get the VCF in a format so that it can be used in R (possibly) to generate a Venn Diagram?

In other words, what I have to do or what information I need to extract from the VCF to produce the plot I want to show in R? If someone has already done something similar, any help is appreciated. Thanks in advance!

truvari structural-variants r vcf venn-diagram • 1.2k views

ADD COMMENT • link 10 months ago by Matteo Ungaro ▴ 100

1

Entering edit mode

i think we'd have to see a snippet of this merged.vcf to know how to proceed

ADD REPLY • link 10 months ago by Jeremy Leipzig 22k

0

Entering edit mode

@Jeremy Leipzig. Indeed, that was a lack on my side sorry. Here is a view at how the merged file looks like merged.vcf-snippet

ADD REPLY • link 10 months ago by Matteo Ungaro ▴ 100

1

Entering edit mode

ok it's not clear to me this file describes the origin of the calls. can you generate intersections with the origin vcfs using bedtools and then generate a venn diagram using intervene?

ADD REPLY • link 10 months ago by Jeremy Leipzig 22k

1

Entering edit mode

Hi again @Jeremy Leipzig. I moved on and did what you recommended me; basically, I run the following

intervene venn -i Documents/vennDiagram/6103671_diploidSV.vcf.gz Documents/vennDiagram/pap_6103671-biallelic.vcf.gz Documents/vennDiagram/6103671-biallelic.vcf.gz -o Documents/vennDiagram/ --names=linear,graph,personalized --title three-way-vennDiagram --figtype svg --figsize 12 12 --bedtools-options header

mainly because I realized that, for some reason, the file I generated with bcftools intersect was not processed correctly by intervene. Maybe you could point me out the exact command, as I might have done something wrong. VCF are processed correctly as long as I add the --bedtools-options header. Other than that below is the image of the venn Diagram showing the intersect for the SVs detected for samples 6103671 using the linear reference (6103671_diploidSV.vcf.gz), a pangenome graph (pap_6103671-biallelic.vcf.gz) and another version for the pangenome (6103671-biallelic.vcf.gz). Let me know whether this makes sense, thanks again!

ADD REPLY • link 10 months ago by Matteo Ungaro ▴ 100

1

Entering edit mode

ok sure that looks believable. you might like the UpSet plot better simply because it will maintain the proportions.

ADD REPLY • link 10 months ago by Jeremy Leipzig 22k

1

Entering edit mode

Thanks again @Jeremy Leipzig this saved me a lot of trouble mainly to figure out what-is-what (or needed) for Venn Diagrams in R starting from the intersection of VCF files. I will try also the intervene upset, it seems a pretty useful tool overall!

ADD REPLY • link 10 months ago by Matteo Ungaro ▴ 100