Question: Problem To Understand Copy-Number Values Per Gene Provided By The Broad Institute
gravatar for Fred Fleche
8.8 years ago by
Fred Fleche4.3k
Paris, France
Fred Fleche4.3k wrote:

Dear all,

I wanted to play with and integrate gene copy number data provided by the Broad Institute and available at :

But when I had a look to the section DNA Copy Number and did download the set CCLEcopynumberbyGene_2010-10-28.txt.gz I did not get the data I was expecting.

Indeed I was expecting integer values like it is provided in the file TCNcelllines_120310.xls by the Sanger Institute.

So if someone could have a look at it and explain me why it is not integer values that are provided. And actually the values are strange to me because I don't understand their meaning.

I am eager to learn the piece information I have missed.



Below is the screen-shot of the file (Larry thanks for the suggestion):

alt text

data copynumber • 11k views
ADD COMMENTlink modified 8.8 years ago by Chris Miller21k • written 8.8 years ago by Fred Fleche4.3k

Fred, it would be very helpful if you could reproduce here a portion of that download. I'm wondering if the Broad file contains CGH hybe data which would then need to be analyzed to give a ratio. That ration is often near an integer but could be 1.8 to 2.2 for 2 copies of a gene/region.

ADD REPLYlink written 8.8 years ago by Larry_Parnell16k
gravatar for Chris Miller
8.8 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

Those are going to be log2 copy number values for each gene, almost certainly after the data is segmented. If you want to roughly calculate the absolute copy number at that position, you can convert out of log2 and round to the nearest integer:

So for one of your values:

log2 cn: 0.4194

This is the ratio between tumor and normal (or between your cell line and a panel of "normal" cells)

To convert to absolute copy number, we do

(2^0.4194)*2 = 2.674742

The mulitplication by two is because we assume a diploid genome in the normal.

Rounding, you'd probably say this gene was duplicated and there are three copies.

This is not the ideal way to calculate absolute copy number, but given the information you have, I think it'll be about as good as you can get.

ADD COMMENTlink written 8.8 years ago by Chris Miller21k

It was the end of a long day, Chris, when I answered this and neglected to first read your more comprehensive answer, +1.

ADD REPLYlink written 8.8 years ago by Larry_Parnell16k

Just to update this: there is no better documentation, here:

ADD REPLYlink written 9 months ago by Kevin Blighe63k
gravatar for Larry_Parnell
8.8 years ago by
Boston, MA USA
Larry_Parnell16k wrote:


It is is difficult to read the table, but what I think I see makes me believe that these are log(2) values of the CGH array data. Does that seem to make any sense to you?

ADD COMMENTlink written 8.8 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour