CNV data from TCGA
1
2
Entering edit mode
8.9 years ago
Na Sed ▴ 310

I am analyzing CNV data downloaded from TCGA database (level 3) and aim to convert it to a gene-level matrix.

The files are like the below:

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
BAIZE_p_TCGA_b138_SNP_N_GenomeWideSNP_6_A02_808774    1    3218610    16796721    7253    -0.0198
BAIZE_p_TCGA_b138_SNP_N_GenomeWideSNP_6_A02_808774    1    16796742    17763566    312    -0.3615
BAIZE_p_TCGA_b138_SNP_N_GenomeWideSNP_6_A02_808774    1    17764034    221905958    105172    -0.0073

To convert CNV data to gene-level data, I map genome regions to genes. In some cases, two different regions with different 'Segment_Mean' values are mapped to one gene. In this case, is it correct if I use the average of 'Segment_Mean' values for that gene?

Any thoughts?

It should be mentioned that the data has been obtained using SNP Array 6.0.

Thanks

CNV TCGA • 4.0k views
ADD COMMENT
2
Entering edit mode
8.9 years ago

That may or may not be a reasonable assumption - It completely depends on what you're trying to infer from the data. For example: if the breakpoint is truly in the middle of a gene, it means that an amplified copy of the gene is non-functional. That could be equivalent to no amplification, or the truncated (or fused) protein could have unexpected effects. In such a case, interpreting it as an amplification would clearly be wrong.

So there's no easy answer. It's not wrong, per se, to do what you're suggesting, but be aware of the caveats, and be explicit about what you did when you write it up.

ADD COMMENT
0
Entering edit mode

@Chris Miller, Can you introduce some references about interpreting CNV data? I am new in this field.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6