TCGA normalized count data of 1000 samples for DGE
0
1
Entering edit mode
7.9 years ago
David_emir ▴ 460

Hi all,

I have downloaded TCGA Breast cancer normalised data sets from 1000 samples form RNA seq V2 . The counts files have only two columns such as Gene_id and Normalised Counts.

gene_id    normalized_count
100130426  11.691
10357      114.6254


My goal is to do Differential expression analysis among these datasets, with various other clinical conditions such as Age, treated/untreated etc.

Please let me know whats the best possible way to do it. or is it possible to do DGE analysis with various clinical parameters?

-Ateeq Khaliq

RNA-Seq TCGA • 3.6k views
1
Entering edit mode

If you have the normalized data and the clinical variables, then it will be possible to perform differential expression, yes. Could you clarify what you are asking? Do you have software that you are going to use? Have you ever done differential expression analysis before?

0
Entering edit mode

Hi Sean,

Right now I don't have any software in my mind to do DGE. I have done DGE before from samples (BAM files) using Tuxedo protocol (Tophat--> Cufflinks --> Cuffdiff --> CummRband), But couldn't get how to continue with this type (TCGA, normalised count). I don't know how to proceed further. I don't have enough space to save the raw data files, because of this I thought of continuing with matrix files, which will be lesser in size from TCGA. but right now I am clueless as how to proceed further. Please help.

1
Entering edit mode

If you have count data, you could try edgeR.

0
Entering edit mode

DESeq2 would also be applicable.

0
Entering edit mode

Hi,

I am doing the same type of analysis. I used TCGA assembler R package to get the actual data. Then matched the clinical data with my RNA-seq data (I am dealing with only one gene so it is easier I guess). Wrote a bit of code to make sure things are matched properly. Then used spss to correlate stuff to clinical factors.

I am also interested in gene expression alterations between normal and tumor. Here is where I am confused. Should I use the normalized_count by itself and compare the two groups? Or do a log2 transformation? Some resources including bioportal calculate up or down regulation based on Z-score.

Any ideas?

0
Entering edit mode

DESeq2 and edgeR are great choices. Limma voom is another possibility. All of these take counts as input.

0
Entering edit mode

Hi, Sean

Thanks for your post. I am wondering if those software take normalized count or the raw count as input?

0
Entering edit mode

The answer depends on what you decide when moving forward with your analysis. Most count-based analysis software, including those mentioned above, will be looking for raw counts.

0
Entering edit mode

Thanks.

Could you also comment on my previous post? Should I use normalized_count by itself or do a log2 transformation?!

Thank you

1
Entering edit mode

In general, you'll want to read the documentation for the software you are going to apply. They are often pretty clear about what to use. In particular, edgeR, DESeq2, and limma voom() all ask specifically for raw counts.