Hello everybody,
I started working with Nanopore, and it produces VCF outputs for SNV/SV/CNV/STR and BedMethyl format for modified nucleotides. I wanted to know if there's any reason (other than perhaps practicality) why these VCFs aren't grouped together, and if there's a tool that converts Bedmethyl to VCF?
This is two questions, which I'll answer separately:
Question 1: Why not have a big unified VCF containing all of SNV/SV/CNV/STR?
Almost all of the downstream applications of these disparate data types are different enough that they require separate dedicated software. Even variant annotation differs a lot. It's great that the VCF spec provides an interchange format for representing these (much better than every SV or CNV caller having its own special format), but in practice it's better to keep the files separate.
(IGV might be able to load a VCF with different variant types in it; I haven't tried. My intuition is that it might also have separate parsers and visualization code under the hood for each different type.)
Question 2: Is there a tool that converts Bedmethyl to VCF?
I'm fairly certain there is not -- I'm not aware of any existing or planned support for base modifications in the VCF spec. You could probably hack something together with awk or similar, but nothing would be able to read it.
It is frustrating that there still isn't a standard for representing base modification counts, but bedmethyl isn't a bad format as these go.