Question: Differential expression analysis for TCGA level 3 RNASeqV2 data?
gravatar for cafelumiere12
5.0 years ago by
United States
cafelumiere1270 wrote:

Hi! I am actually looking at TCGA level3 RNASeqV2 data. My goal is to look at the DEGs (tumor vs. normal) and I'm looking at LUAD now.

I am using edgeR at the moment since the original rsem paper mentioned that those rsem can be processed by edgeR/ DESeq.

I have a couple questions that I was wondering if anyone might have any suggestion - 

1. Does it make sense to include all the tumor samples available, including those that don't have the matching normal samples from the same participant, and analyze for the DEGs? What kind of normalization method would be recommended if I do so? Or can I just use the default normalization of edgeR?

2. I started out looking at only the matched TN and NT samples. Using the above I'm getting 5639 DEGs out of 20531 genes (FDR <=0.05, FC >=2) which seems like a lot? ( even  a lot more if I don't use any FC filter) 

3. There seems to be various discussion regarding what tools to use (!topic/rsem-users/H1cswrvvmPs) I wonder if anyone has more experienced in analyzing TCGA dataset has any thought as to whether it is OK to use EdgeR, or should I use some other tools like EBSeq for RNASeqV2 data?

Any suggestion is greatly appreciated. Thanks a lot in advance!

edger rna-seq R • 6.0k views
ADD COMMENTlink modified 5.0 years ago by Manvendra Singh2.1k • written 5.0 years ago by cafelumiere1270

EdgeR and DESeq is Okay from my point of view for read count datasets

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Manvendra Singh2.1k

If the data are in RPKM, edgeR is a terrible choice. The manual makes it very clear why it needs raw observed read counts. 

Once you have counts so edgeR (or Voom) are valid methods, you will also likely need to run a paired test if you really do have paired data. It's in the manual. 

ADD REPLYlink written 5.0 years ago by ross.lazarus0
gravatar for Manvendra Singh
5.0 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

1. My personal opinion  is that including all the samples would not be good Idea, you would have lot of variance and so would loose many DEGs. Once your matrix is ready go for quantile normalization then you should go for hierarchical clustering of samples with spearman correlation. Choose biggest (should have much more samples than other clusters) clusters (one from each patient and normal). Make these two groups and do DE analysis.

2. Yes, you are getting lot of DEGs , If you want to narrow down, then I have seen some publications where people take FDR threshold 0.01 also. But its okay 1/4 of total genes.

ADD COMMENTlink written 5.0 years ago by Manvendra Singh2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1507 users visited in the last hour