Hi there,
I have a probably trivial question but, as I couldn't find any relatable/useful solution online, I wonder if someone could help. Basically, I need to plot a Venn Diagram for the intersection of three VCF files done using truvari
.
truvari
is a structural variants evaluation tool for benchmarking a call set, generated using different approaches, against a truth. In my case, I'm working with HG002 and three Papuan individuals for which I produced call sets using different approaches: manta
short reads-linear reference, and two pangenome approaches that use PanGenie
for the downstream genome inference.
Said that, what would be interesting is to know what are the SVs in common between callers, but most importantly where the graph approaches do better than the linear reference for all samples — which I already assessed in terms of metrics.
So, with truvari collapse
I produced my merged.vcf for each sample according to the three different call sets; however, the question is how I get the VCF in a format so that it can be used in R (possibly) to generate a Venn Diagram?
In other words, what I have to do or what information I need to extract from the VCF to produce the plot I want to show in R? If someone has already done something similar, any help is appreciated. Thanks in advance!
i think we'd have to see a snippet of this merged.vcf to know how to proceed
@Jeremy Leipzig. Indeed, that was a lack on my side sorry. Here is a view at how the
merged
file looks likeok it's not clear to me this file describes the origin of the calls. can you generate intersections with the origin vcfs using bedtools and then generate a venn diagram using intervene?
Hi again @Jeremy Leipzig. I moved on and did what you recommended me; basically, I run the following
intervene venn -i Documents/vennDiagram/6103671_diploidSV.vcf.gz Documents/vennDiagram/pap_6103671-biallelic.vcf.gz Documents/vennDiagram/6103671-biallelic.vcf.gz -o Documents/vennDiagram/ --names=linear,graph,personalized --title three-way-vennDiagram --figtype svg --figsize 12 12 --bedtools-options header
mainly because I realized that, for some reason, the file I generated with
bcftools intersect
was not processed correctly byintervene
. Maybe you could point me out the exact command, as I might have done something wrong. VCF are processed correctly as long as I add the--bedtools-options header
. Other than that below is the image of the venn Diagram showing the intersect for the SVs detected for samples 6103671 using the linear reference (6103671_diploidSV.vcf.gz), a pangenome graph (pap_6103671-biallelic.vcf.gz) and another version for the pangenome (6103671-biallelic.vcf.gz). Let me know whether this makes sense, thanks again!ok sure that looks believable. you might like the UpSet plot better simply because it will maintain the proportions.
Thanks again @Jeremy Leipzig this saved me a lot of trouble mainly to figure out what-is-what (or needed) for Venn Diagrams in R starting from the intersection of VCF files. I will try also the
intervene upset
, it seems a pretty useful tool overall!