Too many RNAseq samples. What to do?
0
0
Entering edit mode
3.7 years ago

Dear Biostarsers!

I wanted to obtain TCGA vs GTex differentially expressed genes using either DESEQ2 or EdgeR; however, I cannot use Amazon and Galaxy failed to do the job, and the amount of RAM on my laptop didnt allow the computation. I thought using non-bayesian techniques might solve my problem. Should I give it a try? What Nonbayesian R packages are commonly used? On the other hand, Is it possible to filter genes to keep only 20% or so top variable ones without prior estimates? I am starting my analysis with normalized TCGA+GTEx count data. Thanks in advance.

RNA-Seq deseq2 edger • 1.0k views
ADD COMMENT
2
Entering edit mode

I don't think that these methods use much more RAM than the genes x counts matrix, so I'm not sure that the method used is going to make that much of a difference. 20,000 individuals vs 20,000 genes is just a big matrix. Can I ask why you want to do this? Are you aware that comparing TCGA to GTEx is mainly like to leave you with batch effects rather than biologically differential genes?

ADD REPLY
0
Entering edit mode

Thanks. That would be of great advantage if I could compute the amount of RAM for my work. I am going to compare whithin TCGA DEGs and TCGA vs GTEx DEGs; ie once compare cancerous vs normal tissue present in TCGA data and then the TCGA cancerous tissue with GTEX. For the batch effects, no actually I wasnt aware! Thanks a lot.

ADD REPLY
0
Entering edit mode

My guess is that you are going to find that within TCGA DEGs are very different from TCGA vs GTEx DEGs, and many of those difference will probably be because GTEx was prepared by different people, using different protocols on different days to TCGA. But it will be interesting to find out.

ADD REPLY
2
Entering edit mode

I wanted to obtain TCGA vs GTex differentially expressed genes using either DESEQ2 or EdgeR ... I am starting my analysis with normalized TCGA+GTEx count data

You should be starting with the raw counts for those packages.

If you are concerned about computational resources, the limma-voom or limma-trend pipeline should be less intensive. See this discussion: https://support.bioconductor.org/p/112573/

ADD REPLY
1
Entering edit mode

Not enough info to answer here. How are you trying to do this? DESeq2, edgeR, limma? How are you generating counts? What exactly is the issue(s) you're running into?

ADD REPLY
0
Entering edit mode

Thanks. I updated the post.

ADD REPLY

Login before adding your answer.

Traffic: 2652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6