First post so apologies for any etiquette breaches.
I have a large cohort of over 1000 patients who have had WGS done and then Canvas/Manta structural variant caller applied to their data. I have then merged these into a single VCF file.
In the ID (3rd column) I have the types of call made by one of the two alogrithms e.g. MantaDEL, MantaINV, CanvasGAIN etc etc. After each piece of text there is a string of number representing the graph plot associated with how that call was generated e.g. MantaDel0:1:3:00 which is unique to that call.
I am trying to create individual VCFs of each unique type of call. However, if I run the following code to see what the unique SV types are:
bcftools query -f '%ID/n' SV.vcf.gz
I unsurprisingly get a massive list of each "ID" with the unique number string after it when all I want is the first part, as in to know what are the types of SV call so I can then use bcftools to create individual file types for each.
I hope this makes sense and apologies if this has been covered elsewhere.