Question

RNASeq datasets with different total counts

0

Entering edit mode

3.2 years ago

felipead66 ▴ 110

I have 2 RNA Seq datasets from mouse data but the total counts of each sample differs between the 2 datasets. The first dataset has an average of ~400 million PF reads, the second has ~550 million PF reads.

Can these datasets be combined? What kind of normalization has to be done in order to compare the datasets?

normalization RNA-seq • 950 views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 3.2 years ago by felipead66 ▴ 110

0

Entering edit mode

You need a little more description. Are these data sets different experiments? If so, do they have the same experiment structure? What are PF reads? Do you want to Combine them? or Compare them?

ADD REPLY • link 3.2 years ago by seidel 11k

0

Entering edit mode

Thank you for your time. The datasets have the same experimental structure and I want to make a "master dataset" out of these two datasets. My concern is that the two datasets have different number of reads, therefore I wonder if a special normalization is required.

ADD REPLY • link 3.2 years ago by felipead66 ▴ 110

score 0 · Answer 1 · 2021-02-23

0

Entering edit mode

3.2 years ago

agata88 ▴ 870

Assuming, these are same experiments, I would check how many reads samples have in both sets and found sample with the lowest value. Next, I would performed subsampling of reads for the rest of samples with this value. Maybe, before that, you have to check and decide whether the minimum read depth is sufficient, cause maybe some of the samples need to be removed first.

Best, Agata

ADD COMMENT • link 3.2 years ago by agata88 ▴ 870

1

Entering edit mode

But edgeR and DESeq2 have normalization procedures for dealing with different sequencing depth across samples. You might simply remove genes for which you see counts in one experiment but not the other, filter further for some minimum read count across samples, and see how these methods do at identifying DE genes. They also have methods for exploring batch effects, given that you have two different data sets.

ADD REPLY • link 3.2 years ago by seidel 11k

0

Entering edit mode

You mean remove genes which have, let's say, 0 in dataset1 and non-zero in dataset2 and then filter the genes which have, let's say less than 10 counts across all samples?

ADD REPLY • link 3.2 years ago by felipead66 ▴ 110

0

Entering edit mode

yes, that's what I mean :) By the way, it would be interesting to see how different the experiments are by directly comparing identical samples between them. i.e. exp1Control/exp2Control.

ADD REPLY • link 3.2 years ago by seidel 11k