Problem To Understand Copy-Number Values Per Gene Provided By The Broad Institute
2
4
Entering edit mode
12.4 years ago

Dear all,

I wanted to play with and integrate gene copy number data provided by the Broad Institute and available at : http://www.broadinstitute.org/ccle/data/browseData?conversationPropagation=begin

But when I had a look to the section DNA Copy Number and did download the set CCLEcopynumberbyGene_2010-10-28.txt.gz I did not get the data I was expecting.

Indeed I was expecting integer values like it is provided in the file TCNcelllines_120310.xls by the Sanger Institute.

So if someone could have a look at it and explain me why it is not integer values that are provided. And actually the values are strange to me because I don't understand their meaning.

I am eager to learn the piece information I have missed.

Regards,

Fred

Below is the screen-shot of the file (Larry thanks for the suggestion):

copynumber data • 15k views
0
Entering edit mode

Fred, it would be very helpful if you could reproduce here a portion of that download. I'm wondering if the Broad file contains CGH hybe data which would then need to be analyzed to give a ratio. That ration is often near an integer but could be 1.8 to 2.2 for 2 copies of a gene/region.

10
Entering edit mode
12.4 years ago

Those are going to be log2 copy number values for each gene, almost certainly after the data is segmented. If you want to roughly calculate the absolute copy number at that position, you can convert out of log2 and round to the nearest integer:

So for one of your values:

log2 cn: 0.4194


This is the ratio between tumor and normal (or between your cell line and a panel of "normal" cells)

To convert to absolute copy number, we do

(2^0.4194)*2 = 2.674742


The multiplication by two is because we assume a diploid genome in the normal.

Rounding, you'd probably say this gene was duplicated and there are three copies.

This is not the ideal way to calculate absolute copy number, but given the information you have, I think it'll be about as good as you can get.

1
Entering edit mode

It was the end of a long day, Chris, when I answered this and neglected to first read your more comprehensive answer, +1.

0
Entering edit mode

Just to update this: there is no better documentation, here: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/

1
Entering edit mode
12.4 years ago

Fred,

It is is difficult to read the table, but what I think I see makes me believe that these are log(2) values of the CGH array data. Does that seem to make any sense to you?