Hello,
I've been working with both long reads (ONT) and short reads (Illumina) whole genome datasets for horses, with the aim of generating a model for DeepVariant for hybrid sequencing. To develop an accurate model, I understand the importance of creating a custom truth VCF file using the available data. Could anyone provide insights into the best approach to achieve this? I read the article about training models for non-human organisms: https://google.github.io/deepvariant/posts/2018-12-05-improved-non-human-variant-calling-using-species-specific-deepvariant-models/ However, I don't have any trio data available, and I couldn't find any gold standard dataset for horses. I would greatly appreciate guidance on how to effectively generate a custom truth VCF.
Thank you in advance!