TCGA normalized count data of 1000 samples for DGE
0
1
Entering edit mode
9.0 years ago
David_emir ▴ 490

Hi all,

I have downloaded TCGA Breast cancer normalised data sets from 1000 samples form RNA seq V2 . The counts files have only two columns such as Gene_id and Normalised Counts.

gene_id    normalized_count
100130426  11.691
10357      114.6254

My goal is to do Differential expression analysis among these datasets, with various other clinical conditions such as Age, treated/untreated etc.

Please let me know whats the best possible way to do it. or is it possible to do DGE analysis with various clinical parameters?

Your suggestions is highly valuable. Thanks a lot for your help.

-Ateeq Khaliq

RNA-Seq TCGA • 4.3k views
ADD COMMENT
1
Entering edit mode

If you have the normalized data and the clinical variables, then it will be possible to perform differential expression, yes. Could you clarify what you are asking? Do you have software that you are going to use? Have you ever done differential expression analysis before?

ADD REPLY
0
Entering edit mode

Hi Sean,

Right now I don't have any software in my mind to do DGE. I have done DGE before from samples (BAM files) using Tuxedo protocol (Tophat--> Cufflinks --> Cuffdiff --> CummRband), But couldn't get how to continue with this type (TCGA, normalised count). I don't know how to proceed further. I don't have enough space to save the raw data files, because of this I thought of continuing with matrix files, which will be lesser in size from TCGA. but right now I am clueless as how to proceed further. Please help.

ADD REPLY
1
Entering edit mode

If you have count data, you could try edgeR.

ADD REPLY
0
Entering edit mode

DESeq2 would also be applicable.

ADD REPLY
0
Entering edit mode

Hi,

I am doing the same type of analysis. I used TCGA assembler R package to get the actual data. Then matched the clinical data with my RNA-seq data (I am dealing with only one gene so it is easier I guess). Wrote a bit of code to make sure things are matched properly. Then used spss to correlate stuff to clinical factors.

I am also interested in gene expression alterations between normal and tumor. Here is where I am confused. Should I use the normalized_count by itself and compare the two groups? Or do a log2 transformation? Some resources including bioportal calculate up or down regulation based on Z-score.

Any ideas?

ADD REPLY
0
Entering edit mode

DESeq2 and edgeR are great choices. Limma voom is another possibility. All of these take counts as input.

ADD REPLY
0
Entering edit mode

Hi, Sean

Thanks for your post. I am wondering if those software take normalized count or the raw count as input?

ADD REPLY
0
Entering edit mode

The answer depends on what you decide when moving forward with your analysis. Most count-based analysis software, including those mentioned above, will be looking for raw counts.

ADD REPLY
0
Entering edit mode

Thanks.

Could you also comment on my previous post? Should I use normalized_count by itself or do a log2 transformation?!

Thank you

ADD REPLY
1
Entering edit mode

In general, you'll want to read the documentation for the software you are going to apply. They are often pretty clear about what to use. In particular, edgeR, DESeq2, and limma voom() all ask specifically for raw counts.

ADD REPLY

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6