CNA calculation from Segmentation file
2
1
Entering edit mode
6.5 years ago
Meenakshi ▴ 10

Hi

I have a compiled  segmentation file (nocnv) for almost 1000 samples. I want to calculate thecopy number aberrations  for all the samples.

I tried using GISTIC, but I believe it provides only the consensus copy number variations. Is there a better way to calculate copy number aberration  from segmentation file.

Here are few lines from teh segmentation file I have:

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
TCGA-E2-A1IH-01A-11D-A13J-01    1    3208470    42629316    20791    -0.2864
TCGA-E2-A1IH-01A-11D-A13J-01    1    42631290    42692913    60    0.0183
TCGA-E2-A1IH-01A-11D-A13J-01    1    42701180    42918495    113    0.3411
TCGA-E2-A1IH-01A-11D-A13J-01    1    42925163    43153569    131    -0.0548
TCGA-E2-A1IH-01A-11D-A13J-01    1    43153652    43599777    288    -0.3653
TCGA-E2-A1IH-01A-11D-A13J-01    1    43600151    44642193    575    -0.0401

Thanks

Meenakshi

copy number Segmentation File CNA • 4.5k views
0
Entering edit mode

When you say segmentation file, that sounds like it already contains CNV segments. Can you show us some of the file so we know what you need to do?

0
Entering edit mode

I added some sample lines from the segmentation file for  your review.

Thanks !

0
Entering edit mode
Do you mean you want to know how much CNAs there are per sample?
0
Entering edit mode

Yes, I am trying to calculate the Genome instability index per sample using the CNA from each sample. So, I want to quantify the CNAs using the segmentation file for each sample.

Thanks

1
Entering edit mode

This is called data aggregation. Although you can do this in any programming language, I think for you it is easiest to do with Excel pivot tables. Follow some tutorials on how to use pivot tables and then apply to your segmentation results. As Charles warden suggested, you might want to add a column that describes if a segment meets several criteria (length, amplitude) and is therefore considered significant.

0
Entering edit mode

Could you explain how you calculated the genome instability using CNA? Thanks

4
Entering edit mode
6.5 years ago

You'll want to define some threshold for |Segment_Mean| > X because I would probably not consider all segments to be true copy number alterations.  Filtering based upon length and number of probes is probably also a good idea.

I don't have a precise recommendation of what threshold to use, but I would try plotting density / histogram values for the Segment_Mean to guide your decisions.

3
Entering edit mode

Here is a little R function to plot segments.

plot_segments = function(segments_file){

seg.spl = split(seg,as.factor(as.character(seg$Chromosome))) pdf(file=paste(segments_file,"pdf",sep="."),paper="special",width=12,onefile=T,pointsize=8) par(mfrow=c(4,1)) for(i in 1:length(seg.spl)){ x = seg.spl[[i]] plot(x$Start,x$Segment_Mean,xlim = c(x[1,3],x[nrow(x),4]),pch = "",ylim = c(-3,3),xlab = paste("chr",names(seg.spl[i]),sep='_'),ylab = 'log2 ratio') points(x$End,x$Segment_Mean,pch = '') segments(x0 = x$Start , y0 = x$Segment_Mean , x1 = x$End, y1 = x$Segment_Mean, lwd = 2, col = 'maroon') abline(h = 0, lty = 1,lwd = 0.5) } dev.off() } ADD REPLY 2 Entering edit mode @poisonAlien, Thank you for the great script. There were some typos in it so I fixed it. Here is the functioning script. plot_segments = function(segments_file){ seg = read.delim(segments_file, sep="\t") seg.spl = split(seg,as.factor(as.character(seg$Chromosome)))
pdf(file=paste(segments_file,"pdf",sep="."),paper="special",width=12,onefile=T,pointsize=8)
par(mfrow=c(4,1))
for(i in 1:length(seg.spl)){
x = seg.spl[[i]]
plot(x$Start,x$Segment_Mean,xlim = c(x[1,3],x[nrow(x),4]),pch = "",ylim = c(-3,3),xlab = paste("chr",names(seg.spl[i]),sep="_"),ylab = "log2 ratio")
points(x$End,x$Segment_Mean,pch = "")
segments(x0 = x$Start , y0 = x$Segment_Mean , x1 = x$End, y1 = x$Segment_Mean, lwd = 2, col = "maroon")
abline(h = 0, lty = 1,lwd = 0.5)
}
dev.off()
}

1
Entering edit mode

That's right. It looks like Meenakshi already has segments and their log-ratio copynumber to some kind of control. Thus when the value says 0 at 42.6MB it's a copy-normal segment, negatives are losses and positives are gains. There's a lot of noise on a microarray and you have to choose a cutoff for a strong signal that you really believe. If the Segment Mean says 1, that's a log2 of 200%, or one gain, possibly 4 copies of a naturally diploid organism.

Anyway, histogram and filter the segment means as CharlesWarden suggested and take a look at the sex chromosomes for guidance (known ploidy).

0
Entering edit mode

Hi, a quick question. I am wondering to what the Num_Probes is referring to. Is it the number of hetrozygous SNPs within a segment or number of read depth counts within a segment?

0
Entering edit mode
6.5 years ago
Meenakshi ▴ 10

Thank you all for your suggestions and guidance.

Based on what I could gather, I am applying different cutoffs on #Probes and Segment length to filter out the noisy data. Using the chosen data , I am calculating Absolute CN using (2^Segment_mean)*2. if the value is >2 it can be considered as a copy number change, but to get more significant CNs, I am considering 2.7 as cutoff for absolute CNS. This is purely intuitive and I still need to work a way out to get optimal cutoffs.

Thanks for all the help.. Any comment/criticism is welcome.

0
Entering edit mode

Actually I think you don't need the cutoff and it might even be better if you don't use them. You can just calculate the number of segments per sample and that will be proportional to the genomic instability. Try both methods and see what the correlation is between them.