Analyzing multiple RNA-seq datasets
1
0
Entering edit mode
3.6 years ago

Hi, I have two main questions about the analysis of multiple RNA-seq datasets. I have used two different available datasets to do the differential expression analysis. Both of these datasets were sequenced paired-end and on the same platform, but with different sequencing depth and library preparation protocol. I obtained count files of all samples of these two datasets and used the DESeq2 package for downstream analysis. Now, my questions are these:

  1. Since my samples are coming from different studies, do I need to perform meta-analysis using packages like meta-seq? Or DESeq2 can handle the analysis individually?

  2. It has been said that tools like DESeq2 run into problems if they get ~10x differences in depth. So I checked the total number of reads and got the following results. As you can see, some of my samples contain a high number of reads, whereas others have low reads numbers. What should I do about these samples? Does DESeq2 normalization fix this problem?

SRR7293809 SRR7293810 SRR7293811 SRR7293812 SRR7293813 SRR7293814 SRR7293815 
 24.767506  23.405950  28.945145  26.508370  29.501141  22.468038  20.940488 
SRR7293816 SRR7293817 SRR7293818 SRR7293819 SRR7293820 SRR7293821 SRR7293822 
 32.734192  28.559845  24.178953  26.915176  27.974233  25.095936  22.361696 
SRR7293823 SRR7293824 SRR7293825 SRR7293826 SRR7293839 SRR7293840 SRR7293841 
 25.579205  25.226523  22.447988  23.887697  56.068248  51.443332 129.410829 
SRR7293842 SRR7293843 SRR7293844 SRR7293845 SRR7293846 SRR8418436 SRR8418437 
 76.685881  48.743216  57.768380  50.982525  48.694810   1.228239   3.818818 
SRR8418438 SRR8418439 SRR8418440 SRR8418441 SRR8418442 SRR8418443 SRR8418444 
  2.056540   1.689432   1.609765   2.736703   1.889471   1.276566   1.577202 
SRR8418450 SRR8418454 SRR8418455 SRR8418465 
  2.322081   2.052362   2.174595  10.473356
  

Thanks.

rna-seq • 806 views
ADD COMMENT
1
Entering edit mode
3.6 years ago

You can tell DESeq that your samples came from different batches in the design, if batch is distinct from your other experimental conditions. DESeq will incorporate the knowledge that different samples came from different batches in its algorithms, you don't need to modify or correct the data in any other way.

You might try running the analysis two ways; one with the counts as they are, and the other with the the counts of those last several samples downsampled to match the others.

ADD COMMENT

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6