I am trying to use ABSOLUTE to estimate sample purity and ploidy of a whole genome sequenced sample. ABSOLUTE receives a segmentation file where one of the fields/columns of the input is Num_Probes. I am wondering to what this Num_Probes is referring. Is it the number of hetrozygous SNPs within a segment or number of read depth count probes within a segment?
Thanks for your answer! What made me wondering if it was the number of heterozygous SNPs was that TCGA also uses CBS (DNAcopy) to segment SNP array data and the segmentation file contains a field called Num_Probes. So, is it safe to conclude that based on the type of experiment (aCGH, SNP array, WGS), meaning of Num_Probes changes?
Yes. I would say that the Num_probes is a "residual" of the times in which the main method for DOC analysis was aCGH, but is basically the minimal unit for which the log2ratio of coverage has been computed (a SNP, a probe, or a small window).
It makes sense now if it is a "legacy" from aCGH era which leads me to wonder what was the purpose to report the number of probes in the first place. My guess is that using this information one could filter segments with low number of probes out; Is this true?
Yes, that's the reason.