Question: TCGA normalized count data of 1000 samples for DGE
1
gravatar for David_emir
4.9 years ago by
David_emir370
India
David_emir370 wrote:

Hi all, 

I have downloded TCGA Breast cancer normalised data sets from 1000 samples form RNA seq V2 . The counts files have only two coloumns such as Gene_id and Normalised Counts.

gene_id
normalized_count
100130426
11.691
 
10357
114.6254

My goal is to do  Differential expression annalysis among these datasets, with various other clinical conditions such as Age, treated/untreated etc.

Please let me know whats the best possible way to do it. or is it possible to do DGE analysis with various clinical parameters ?

Your suggestions is highily valuable. Thanks a lot for your help.

-Ateeq Khaliq

rna-seq normalised data tcga • 2.2k views
ADD COMMENTlink modified 4.8 years ago by Biostar ♦♦ 20 • written 4.9 years ago by David_emir370
1

If you have the normalized data and the clinical variables, then it will be possible to perform differential expression, yes.  Could you clarify what you are asking?  Do you have software that you are going to use?  Have you ever done differential expression analysis before?

ADD REPLYlink written 4.9 years ago by Sean Davis26k

hi Sean,

Right now i dont have any software in my mind to do DGE. I have done DGE before from samples (BAM files) using Tuxedo protocol (Tophat--> Cufflinks --> Cuffdiff --> CummRband) , But couldnt get how to continue with this type (TCGA, normalised count). I dont know how to proceed further. I dont have enough space to save the raw data files, because of this i thought of continuing with matrix files, which will be lesser in size from TCGA. but right now i am clueless as  how to proceed further.please help.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by David_emir370
1

If you have count data, you could try edgeR.

ADD REPLYlink written 4.9 years ago by Jordan1.1k

DESeq2 would also be applicable.

ADD REPLYlink written 4.9 years ago by Sean Davis26k

Hi,

I am doing the same type of analysis. I used TCGA assembler R package to get the actual data. Then matched the clinical data with my rnaseq data (I am dealing with only one gene so it is easier I guess). Wrote a bit of code to make sure things are matched properly. Then used spss to correlate stuff to clinical factors.

I am also interested in gene expression alterations between normal and tumor. Here is where I am confused. Should I use the normalized_count by itself and compare the two groups? Or do a log2 transformation? Some resources including bioportal calculate up or down regulation based on Z-score.

Any ideas?

ADD REPLYlink written 4.9 years ago by juara10

DESeq2 and edgeR are great choices.  Limma voom is another possibility.  All of these take counts as input.

ADD REPLYlink written 4.9 years ago by Sean Davis26k

Hi, Sean

Thanks for your post. I am wondering if those software take normalized count or the raw count as input?

ADD REPLYlink written 4.9 years ago by juara10

The answer depends on what you decide when moving forward with your analysis.  Most count-based analysis softwares, including those mentioned above, will be looking for raw counts.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Sean Davis26k

Thanks.

Could you also comment on my previous post? Should I use normalized_count by itself or do a log2 transformation?!

Thank you

ADD REPLYlink written 4.9 years ago by juara10
1

In general, you'll want to read the documentation for the software you are going to apply.  They are often pretty clear about what to use.  In particular, edgeR, DESeq2, and limma voom() all ask specifically for raw counts.

ADD REPLYlink written 4.9 years ago by Sean Davis26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour