what does Num_Probes in a segmentation file refer to?
1
0
Entering edit mode
6.5 years ago
Dataman ▴ 350

Hi,

I am trying to use ABSOLUTE to estimate sample purity and ploidy of a whole genome sequenced sample. ABSOLUTE receives a segmentation file where one of the fields/columns of the input is Num_Probes. I am wondering to what this Num_Probes is referring. Is it the number of hetrozygous SNPs within a segment or number of read depth count probes within a segment?

sequencing • 3.3k views
ADD COMMENT
2
Entering edit mode
6.5 years ago
Fabio Marroni ★ 2.9k

None of them. I guess is the number of CGH probes from which the segmented window is composed. Remember that ABSOLUTE is for CGH and not for sequencing (although it can easily be used to analyse DOC signature in sequenging experiments). My guess is:

"Chromosome": Chromosome number

"Start": Start position of the segmented window

"End": End position of the segmented window

"Num_Probes": Number of probes composing the segmented window.

If you use DNAseq data you should design windows of fixed size (or, better, windows of fixed mappability) and each of them is considered asd a probe. Segment_Mean": Average log2ratio of signal intensity across the segmented window.

I suggest you use the HAPSEG package they suggest to perform segmentation. Otherwise I suggest the widely used DNAcopy, another R-package.

ADD COMMENT
1
Entering edit mode

Thanks for your answer! What made me wondering if it was the number of heterozygous SNPs was that TCGA also uses CBS (DNAcopy) to segment SNP array data and the segmentation file contains a field called Num_Probes. So, is it safe to conclude that based on the type of experiment (aCGH, SNP array, WGS), meaning of Num_Probes changes?

ADD REPLY
1
Entering edit mode

Yes. I would say that the Num_probes is a "residual" of the times in which the main method for DOC analysis was aCGH, but is basically the minimal unit for which the log2ratio of coverage has been computed (a SNP, a probe, or a small window).

ADD REPLY
0
Entering edit mode

It makes sense now if it is a "legacy" from aCGH era which leads me to wonder what was the purpose to report the number of probes in the first place. My guess is that using this information one could filter segments with low number of probes out; Is this true?

ADD REPLY
1
Entering edit mode

Yes, that's the reason.

ADD REPLY

Login before adding your answer.

Traffic: 726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6