Question: Genomic Matrix data from TCGA need to be analysied for Differential gene expression
1
gravatar for David_emir
4.0 years ago by
David_emir330
India
David_emir330 wrote:

Hello All,

I have retrieved the data matrix from TCGA breast invasive carcinoma (BRCA) – expression data, The data is Level_3 Data (file names: *.rsem.genes.normalized_results) downloaded from TCGA DCC, log2(x+1) transformed, and processed data

In a data matrix, each row represents a feature(Gene name) and each column corresponds to a sample. In breast invasive carcinoma (BRCA) , TCGA possesses 1,215 BRCA patient samples, which have been RNA-sequenced by the Illumina HiSeq2000 system. The recorded sequence data have been processed by the RNA-seq version 2 pipeline that uses the Mapsplice alignment algorithm and the RSEM algorithm to generate expression values. Which are further log2(x+1) transformed, and processed data. The Data Matrix looks as follows :

<caption>Genomic Matrix</caption>
sample TCGA-A8-A092-01 TCGA-A7-A0CE-11 TCGA-OL-A5D7-01 TCGA-D8-A1JK- TCGA-E2-A10C-01
ARHGEF10L 8.8784 11.977 8.8784 11.977 8.8784
HIF3A 11.977 8.8784 11.977 8.8784 11.977

 

The data matrix file can be found at https://drive.google.com/file/d/0B4EniZCsdQJ5cEJZSTBCc1htYk0/view?usp=sharing

Please Note: datamatrix is ~20,783 Rows * 1215 coloumns

My question is : The data which is Log2(x+1) transformed, and processed data, how can this be used to Do Differential gene Expression Analysis along with Clinical data?

If yes, then please let me know how to proceed further and what pipeline/softwares to be used.

Thanks a lot for your kind help

-Ateeq Khaliq.

 

dge genomicmatrix tcga • 2.8k views
ADD COMMENTlink modified 2.9 years ago by elizabethR70 • written 4.0 years ago by David_emir330
5
gravatar for Deepak Tanwar
4.0 years ago by
Deepak Tanwar3.9k
ETH Zürich, Switzerland
Deepak Tanwar3.9k wrote:

There would be already patients samples and controls in BRCA data. Did you separate out that?

This entirely depends on what kind of Clinical analysis you want to integrate.

Do you want to check differential gene expression between patient status?

Elaborate:: Differential gene Expression Analysis along with Clinical data?  

ADD COMMENTlink written 4.0 years ago by Deepak Tanwar3.9k

Hi Deepak,

Thanks for your reply. Yes i did seperate Control Vs Diseased (Breast Cancer). and also according to the age of the patients.

so what i really wanted to do is finding DGE b/w control Vs BRCA patients  and DGE B/W different Age groups.

Since i dont have the infrastructure to download the humangous RAW data, i am only left with one option to deal with processed data. I may sound stupid, but this is the only option left for me. Please help. Thanks a lot.

ADD REPLYlink written 4.0 years ago by David_emir330
5
gravatar for Deepak Tanwar
4.0 years ago by
Deepak Tanwar3.9k
ETH Zürich, Switzerland
Deepak Tanwar3.9k wrote:

HI Atheeq,

You could find the DEG's b/w groups by applying t-test, wilcoxon test. You could also do a Log Fold Change.

ADD COMMENTlink written 4.0 years ago by Deepak Tanwar3.9k
Can you please let me know how to go about...any packages any tools available for analysing.thanks a ton Deepak
ADD REPLYlink written 4.0 years ago by David_emir330
1

If you are using R, type following in R for the help:

?t.test

?wilcoxon.test

ADD REPLYlink written 4.0 years ago by Deepak Tanwar3.9k
0
gravatar for elizabethR
2.9 years ago by
elizabethR70
elizabethR70 wrote:

Ive been told by a bioinformatician to use EdgeR to do differential expression analysis. However as I understand it this data cannot be normalised, has to be raw counts (i.e. rsem.genes.results files rather than the normalised files because edgeR normalises it as part of its mathematical modelling algorithm

ADD COMMENTlink written 2.9 years ago by elizabethR70
1

You can always have 2 options:

  1. You normalized data (counts) with edgeR package. This you may have done it.
  2. You normalize data (Upper Quartile Normalization) and then just calculate Differential Expression using edgeR.
ADD REPLYlink written 2.9 years ago by Deepak Tanwar3.9k

you might be better using limma::voom for a dataset of this magnitude

ADD REPLYlink written 2.9 years ago by russhh4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1964 users visited in the last hour