Question: CNA calculation from Segmentation file
1
gravatar for Meenakshi
4.0 years ago by
Meenakshi10
United States
Meenakshi10 wrote:

Hi

I have a compiled  segmentation file (nocnv) for almost 1000 samples. I want to calculate thecopy number aberrations  for all the samples.

I tried using GISTIC, but I believe it provides only the consensus copy number variations. Is there a better way to calculate copy number aberration  from segmentation file.

Here are few lines from teh segmentation file I have:

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
TCGA-E2-A1IH-01A-11D-A13J-01    1    3208470    42629316    20791    -0.2864
TCGA-E2-A1IH-01A-11D-A13J-01    1    42631290    42692913    60    0.0183
TCGA-E2-A1IH-01A-11D-A13J-01    1    42701180    42918495    113    0.3411
TCGA-E2-A1IH-01A-11D-A13J-01    1    42925163    43153569    131    -0.0548
TCGA-E2-A1IH-01A-11D-A13J-01    1    43153652    43599777    288    -0.3653
TCGA-E2-A1IH-01A-11D-A13J-01    1    43600151    44642193    575    -0.0401
 

Thanks

Meenakshi

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Meenakshi10

When you say segmentation file, that sounds like it already contains CNV segments. Can you show us some of the file so we know what you need to do?

ADD REPLYlink written 4.0 years ago by karl.stamm3.5k

I added some sample lines from the segmentation file for  your review.

Thanks !

ADD REPLYlink written 4.0 years ago by Meenakshi10
Do you mean you want to know how much CNAs there are per sample?
ADD REPLYlink written 4.0 years ago by Irsan6.9k

Yes, I am trying to calculate the Genome instability index per sample using the CNA from each sample. So, I want to quantify the CNAs using the segmentation file for each sample.

Thanks

ADD REPLYlink written 4.0 years ago by Meenakshi10
1

This is called data aggregation. Although you can do this in any programming language, I think for you it is easiest to do with Excel pivot tables. Follow some tutorials on how to use pivot tables and then apply to your segmentation results. As Charles warden suggested, you might want to add a column that describes if a segment meets several criteria (length, amplitude) and is therefore considered significant.

ADD REPLYlink modified 3.9 years ago • written 4.0 years ago by Irsan6.9k

Could you explain how you calculated the genome instability using CNA? Thanks

ADD REPLYlink written 2.8 years ago by biostar0
4
gravatar for Charles Warden
4.0 years ago by
Charles Warden7.0k
Duarte, CA
Charles Warden7.0k wrote:

You'll want to define some threshold for |Segment_Mean| > X because I would probably not consider all segments to be true copy number alterations.  Filtering based upon length and number of probes is probably also a good idea.

I don't have a precise recommendation of what threshold to use, but I would try plotting density / histogram values for the Segment_Mean to guide your decisions.

ADD COMMENTlink written 4.0 years ago by Charles Warden7.0k
3

Here is a little R function to plot segments.

plot_segments = function(segments_file){
 
  seg = read.delim(segments_file, sep='\t)
  seg.spl = split(seg,as.factor(as.character(seg$Chromosome)))
 
  pdf(file=paste(segments_file,"pdf",sep="."),paper="special",width=12,onefile=T,pointsize=8)
 
  par(mfrow=c(4,1))
 
  for(i in 1:length(seg.spl)){
    
    x = seg.spl[[i]]
    
    plot(x$Start,x$Segment_Mean,xlim = c(x[1,3],x[nrow(x),4]),pch = "",ylim = c(-3,3),xlab = paste("chr",names(seg.spl[i]),sep='_'),ylab = 'log2 ratio')

    points(x$End,x$Segment_Mean,pch = '')
    
    segments(x0 = x$Start , y0 = x$Segment_Mean , x1 = x$End, y1 = x$Segment_Mean, lwd = 2, col = 'maroon')

    abline(h = 0, lty = 1,lwd = 0.5)    
  }
  dev.off()
}
ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by poisonAlien2.8k
2

@poisonAlien, Thank you for the great script. There were some typos in it so I fixed it. Here is the functioning script.

plot_segments = function(segments_file){
    seg = read.delim(segments_file, sep="\t")
    seg.spl = split(seg,as.factor(as.character(seg$Chromosome)))
    pdf(file=paste(segments_file,"pdf",sep="."),paper="special",width=12,onefile=T,pointsize=8)
    par(mfrow=c(4,1))
    for(i in 1:length(seg.spl)){
        x = seg.spl[[i]] 
        plot(x$Start,x$Segment_Mean,xlim = c(x[1,3],x[nrow(x),4]),pch = "",ylim = c(-3,3),xlab = paste("chr",names(seg.spl[i]),sep="_"),ylab = "log2 ratio")
        points(x$End,x$Segment_Mean,pch = "")
        segments(x0 = x$Start , y0 = x$Segment_Mean , x1 = x$End, y1 = x$Segment_Mean, lwd = 2, col = "maroon")
        abline(h = 0, lty = 1,lwd = 0.5)  
    }
    dev.off()
}
ADD REPLYlink modified 16 months ago by RamRS23k • written 3.9 years ago by Les Ander110
1

That's right. It looks like Meenakshi already has segments and their log-ratio copynumber to some kind of control. Thus when the value says 0 at 42.6MB it's a copy-normal segment, negatives are losses and positives are gains. There's a lot of noise on a microarray and you have to choose a cutoff for a strong signal that you really believe. If the Segment Mean says 1, that's a log2 of 200%, or one gain, possibly 4 copies of a naturally diploid organism. 

 

Anyway, histogram and filter the segment means as CharlesWarden suggested and take a look at the sex chromosomes for guidance (known ploidy). 

ADD REPLYlink written 4.0 years ago by karl.stamm3.5k

Hi, a quick question. I am wondering to what the Num_Probes is referring to. Is it the number of hetrozygous SNPs within a segment or number of read depth counts within a segment?

ADD REPLYlink written 3.3 years ago by Dataman270
0
gravatar for Meenakshi
4.0 years ago by
Meenakshi10
United States
Meenakshi10 wrote:

Thank you all for your suggestions and guidance.

Based on what I could gather, I am applying different cutoffs on #Probes and Segment length to filter out the noisy data. Using the chosen data , I am calculating Absolute CN using (2^Segment_mean)*2. if the value is >2 it can be considered as a copy number change, but to get more significant CNs, I am considering 2.7 as cutoff for absolute CNS. This is purely intuitive and I still need to work a way out to get optimal cutoffs.

 

Thanks for all the help.. Any comment/criticism is welcome.

ADD COMMENTlink written 4.0 years ago by Meenakshi10

Actually I think you don't need the cutoff and it might even be better if you don't use them. You can just calculate the number of segments per sample and that will be proportional to the genomic instability. Try both methods and see what the correlation is between them.

ADD REPLYlink written 3.9 years ago by Irsan6.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour