Question: Too many RNAseq samples. What to do?
0
gravatar for english.server
4 months ago by
Germany
english.server220 wrote:

Dear Biostarsers!

I wanted to obtain TCGA vs GTex differentially expressed genes using either DESEQ2 or EdgeR; however, I cannot use Amazon and Galaxy failed to do the job, and the amount of RAM on my laptop didnt allow the computation. I thought using non-bayesian techniques might solve my problem. Should I give it a try? What Nonbayesian R packages are commonly used? On the other hand, Is it possible to filter genes to keep only 20% or so top variable ones without prior estimates? I am starting my analysis with normalized TCGA+GTEx count data. Thanks in advance.

edger rna-seq deseq2 • 237 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by english.server220
2

I don't think that these methods use much more RAM than the genes x counts matrix, so I'm not sure that the method used is going to make that much of a difference. 20,000 individuals vs 20,000 genes is just a big matrix. Can I ask why you want to do this? Are you aware that comparing TCGA to GTEx is mainly like to leave you with batch effects rather than biologically differential genes?

ADD REPLYlink written 4 months ago by i.sudbery9.8k

Thanks. That would be of great advantage if I could compute the amount of RAM for my work. I am going to compare whithin TCGA DEGs and TCGA vs GTEx DEGs; ie once compare cancerous vs normal tissue present in TCGA data and then the TCGA cancerous tissue with GTEX. For the batch effects, no actually I wasnt aware! Thanks a lot.

ADD REPLYlink written 4 months ago by english.server220

My guess is that you are going to find that within TCGA DEGs are very different from TCGA vs GTEx DEGs, and many of those difference will probably be because GTEx was prepared by different people, using different protocols on different days to TCGA. But it will be interesting to find out.

ADD REPLYlink written 4 months ago by i.sudbery9.8k
2

I wanted to obtain TCGA vs GTex differentially expressed genes using either DESEQ2 or EdgeR ... I am starting my analysis with normalized TCGA+GTEx count data

You should be starting with the raw counts for those packages.

If you are concerned about computational resources, the limma-voom or limma-trend pipeline should be less intensive. See this discussion: https://support.bioconductor.org/p/112573/

ADD REPLYlink modified 4 months ago • written 4 months ago by igor11k
1

Not enough info to answer here. How are you trying to do this? DESeq2, edgeR, limma? How are you generating counts? What exactly is the issue(s) you're running into?

ADD REPLYlink written 4 months ago by jared.andrews078.0k

Thanks. I updated the post.

ADD REPLYlink written 4 months ago by english.server220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1058 users visited in the last hour