How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy --contig-ploidy-priors option ?
Entering edit mode
22 months ago
mikizu • 0

Hi !

I was asked to determine the ploidy level and to do CNV calling on a yeast sample (Reference sequence : S. cerevisiae S288C).

In order to perform CNV calling with the GATK pipeline "(How to) Call common and rare germline copy number variants", in the third step the tool "DetermineGermlineContigPloidy [BETA]" has to be used and a "contig ploidy priors" table file is requested for the option --contig-ploidy-priors. However, after having searched for some answers to my question (the answer may be obvious ? This is the first time I am doing this), I still do not know how to create or generate this file.

Here is the kind of table that I should use :


1 0.01 0.01 0.97 0.01

2 0.01 0.01 0.97 0.01

X 0.01 0.49 0.49 0.01

Y 0.50 0.50 0.00 0.00

Here are some GATK topics about the --contig-ploidy-priors I have already consulted :

Does anyone know or have an idea about how to generate/create this contig ploidy priors table ? Do I have to create a random table and put the numbers I think are good thanks to a ploidy detection that I should perform before ? Do you think that I should just use CNVnator, CNVkit or any other tool for only "tumor"/one sample CNV calling ?

Thank you in advance, any help would be appreciated.

variant-calling cnv gatk gCNV • 2.0k views
Entering edit mode

Did you get any solution? How to make this table?

Entering edit mode
6 months ago
Dr N Ch • 0

Does anyone know or have an idea about how to generate/create this contig ploidy priors table ?

Entering edit mode
6 months ago
kanika.151 ▴ 120

The probabilities in this file should reflect your prior belief for the copy-number state of each contig, given the prevalence of aneuploidies and sex genotypes in the population. For example, the table used in the tutorial indicates that we believe there is a small chance for the copy-number of chr20 to be either 1 or 3, but it is most likely 2.

We use these priors in conjunction with the likelihood of our observed data (i.e., the total read count per contig) to determine the posterior probability of the per-contig copy number in the usual Bayesian manner. As always, high quality data (which is well explained by the likelihood model) will weaken the influence of the prior on the final result. However, if your data quality is low, you may want to impose stronger priors to regularize away the possibility of getting spurious results (e.g., unrealistic sex genotypes).

Ideally, you would run the tool on a “training” set of samples where the truth is known, tuning the priors or other parameters to recover the correct result if necessary. Once this tuning procedure is complete, you can proceed to use the same priors and parameters on subsequent samples. However, if PARs and other problematic regions are appropriately masked (as mentioned in the tutorial), usually the results of this tool are reasonable without any tuning required.

from: found this answer by slee here


Login before adding your answer.

Traffic: 1773 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6