Question: what does Num_Probes in a segmentation file refer to?
0
gravatar for Dataman
3.3 years ago by
Dataman260
Finland
Dataman260 wrote:

Hi,

I am trying to use ABSOLUTE to estimate sample purity and ploidy of a whole genome sequenced sample. ABSOLUTE receives a segmentation file where one of the fields/columns of the input is Num_Probes. I am wondering to what this Num_Probes is referring. Is it the number of hetrozygous SNPs within a segment or number of read depth count probes within a segment?

sequencing • 1.7k views
ADD COMMENTlink modified 3.3 years ago by Fabio Marroni2.3k • written 3.3 years ago by Dataman260
2
gravatar for Fabio Marroni
3.3 years ago by
Fabio Marroni2.3k
Italy
Fabio Marroni2.3k wrote:

None of them. I guess is the number of CGH probes from which the segmented window is composed. Remember that ABSOLUTE is for CGH and not for sequencing (although it can easily be used to analyse DOC signature in sequenging experiments). My guess is:

"Chromosome": Chromosome number

"Start": Start position of the segmented window

"End": End position of the segmented window

"Num_Probes": Number of probes composing the segmented window.

If you use DNAseq data you should design windows of fixed size (or, better, windows of fixed mappability) and each of them is considered asd a probe. Segment_Mean": Average log2ratio of signal intensity across the segmented window.

I suggest you use the HAPSEG package they suggest to perform segmentation. Otherwise I suggest the widely used DNAcopy, another R-package.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Fabio Marroni2.3k
1

Thanks for your answer! What made me wondering if it was the number of heterozygous SNPs was that TCGA also uses CBS (DNAcopy) to segment SNP array data and the segmentation file contains a field called Num_Probes. So, is it safe to conclude that based on the type of experiment (aCGH, SNP array, WGS), meaning of Num_Probes changes?

ADD REPLYlink written 3.3 years ago by Dataman260
1

Yes. I would say that the Num_probes is a "residual" of the times in which the main method for DOC analysis was aCGH, but is basically the minimal unit for which the log2ratio of coverage has been computed (a SNP, a probe, or a small window).

ADD REPLYlink written 3.3 years ago by Fabio Marroni2.3k

It makes sense now if it is a "legacy" from aCGH era which leads me to wonder what was the purpose to report the number of probes in the first place. My guess is that using this information one could filter segments with low number of probes out; Is this true?

ADD REPLYlink written 3.3 years ago by Dataman260
1

Yes, that's the reason.

ADD REPLYlink written 3.3 years ago by Fabio Marroni2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 543 users visited in the last hour