I have retrieved the data matrix from TCGA breast invasive carcinoma (BRCA) – expression data, The data is Level_3 Data (file names: *.rsem.genes.normalized_results) downloaded from TCGA DCC, log2(x+1) transformed, and processed data
In a data matrix, each row represents a feature(Gene name) and each column corresponds to a sample. In breast invasive carcinoma (BRCA) , TCGA possesses 1,215 BRCA patient samples, which have been RNA-sequenced by the Illumina HiSeq2000 system. The recorded sequence data have been processed by the RNA-seq version 2 pipeline that uses the Mapsplice alignment algorithm and the RSEM algorithm to generate expression values. Which are further log2(x+1) transformed, and processed data. The Data Matrix looks as follows :<caption>Genomic Matrix</caption>
The data matrix file can be found at https://drive.google.com/file/d/0B4EniZCsdQJ5cEJZSTBCc1htYk0/view?usp=sharing
Please Note: datamatrix is ~20,783 Rows * 1215 coloumns
My question is : The data which is Log2(x+1) transformed, and processed data, how can this be used to Do Differential gene Expression Analysis along with Clinical data?
If yes, then please let me know how to proceed further and what pipeline/softwares to be used.
Thanks a lot for your kind help