Question: RNASeq datasets with different total counts
0
gravatar for felipead66
12 days ago by
felipead6680
felipead6680 wrote:

I have 2 RNA Seq datasets from mouse data but the total counts of each sample differs between the 2 datasets. The first dataset has an average of ~400 million PF reads, the second has ~550 million PF reads.

Can these datasets be combined? What kind of normalization has to be done in order to compare the datasets?

rna seq normalization • 93 views
ADD COMMENTlink modified 12 days ago by agata88810 • written 12 days ago by felipead6680

You need a little more description. Are these data sets different experiments? If so, do they have the same experiment structure? What are PF reads? Do you want to Combine them? or Compare them?

ADD REPLYlink written 12 days ago by seidel7.4k

Thank you for your time. The datasets have the same experimental structure and I want to make a "master dataset" out of these two datasets. My concern is that the two datasets have different number of reads, therefore I wonder if a special normalization is required.

ADD REPLYlink written 11 days ago by felipead6680
0
gravatar for agata88
12 days ago by
agata88810
Poland
agata88810 wrote:

Assuming, these are same experiments, I would check how many reads samples have in both sets and found sample with the lowest value. Next, I would performed subsampling of reads for the rest of samples with this value. Maybe, before that, you have to check and decide whether the minimum read depth is sufficient, cause maybe some of the samples need to be removed first.

Best, Agata

ADD COMMENTlink modified 12 days ago • written 12 days ago by agata88810
1

But edgeR and DESeq2 have normalization procedures for dealing with different sequencing depth across samples. You might simply remove genes for which you see counts in one experiment but not the other, filter further for some minimum read count across samples, and see how these methods do at identifying DE genes. They also have methods for exploring batch effects, given that you have two different data sets.

ADD REPLYlink written 12 days ago by seidel7.4k

You mean remove genes which have, let's say, 0 in dataset1 and non-zero in dataset2 and then filter the genes which have, let's say less than 10 counts across all samples?

ADD REPLYlink written 11 days ago by felipead6680

yes, that's what I mean :) By the way, it would be interesting to see how different the experiments are by directly comparing identical samples between them. i.e. exp1Control/exp2Control.

ADD REPLYlink written 11 days ago by seidel7.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour
_