I was asked to determine the ploidy level and to do CNV calling on a yeast sample (Reference sequence : S. cerevisiae S288C).
In order to perform CNV calling with the GATK pipeline "(How to) Call common and rare germline copy number variants", in the third step the tool "DetermineGermlineContigPloidy [BETA]" has to be used and a "contig ploidy priors" table file is requested for the option --contig-ploidy-priors. However, after having searched for some answers to my question (the answer may be obvious ? This is the first time I am doing this), I still do not know how to create or generate this file.
Here is the kind of table that I should use :
CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
1 0.01 0.01 0.97 0.01
2 0.01 0.01 0.97 0.01
X 0.01 0.49 0.49 0.01
Y 0.50 0.50 0.00 0.00
Here are some GATK topics about the --contig-ploidy-priors I have already consulted :
- How do you generate the file required for the --contig-ploidy-priors parameter : It was partially answered but only for human studies
- Germline CNV, ploidy and best practices : Same here, it is said that it can be "easily" by using CollectFragmentCounts beforehand (which seems to be now CollectReadCounts) but in the end it is human data so it is suggested to use the default file provided by GATK with some minor changer. However I still don't know how to do it by myself for my yeast sample.
Does anyone know or have an idea about how to generate/create this contig ploidy priors table ? Do I have to create a random table and put the numbers I think are good thanks to a ploidy detection that I should perform before ? Do you think that I should just use CNVnator, CNVkit or any other tool for only "tumor"/one sample CNV calling ?
Thank you in advance, any help would be appreciated.