Question

Subsampling vs normalization

0

Entering edit mode

8.6 years ago

TiPi • 0

Hi,

I was wondering what would be the best solution to compare two samples with drastically uneven amount of reads in RNA-seq, lets say 5 M for library A vs 25 for library B, and couldn't find sufficient information about this. Should one rather normalize according to library size on the level of mapped reads / counts or do a subsampling of library B prior to mapping? What would be pros and cons of it? I am leaning towards the first option as I feel that subsampling can create a bias but I am unsure if DGE tools like DEseq or HTseq can "handle" the differences in library size appropriately.

Thanks

RNA-Seq • 3.3k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by TiPi • 0

0

Entering edit mode

Subsampling would not be a good idea. The library size normalisation should be fine.

ADD REPLY • link 8.6 years ago by GouthamAtla 12k

0

Entering edit mode

Subsampling I think Indeed is not the right option but you should not assume that normalization will solve everything. When you have a lot of variation in library size (<3M reads) and a few samples with very low library size you have to check the post normalization expression estimates of housekeeping genes, or better, spiked in controls if you have them. But I do think you will still be OK with 5 million reads in most scenario's.

ADD REPLY • link 8.6 years ago by Irsan ★ 7.8k

0

Entering edit mode

In my experience, the typical normalization methods break down at around 10x difference in read number between the lowest and median library. You can typically see this in some of the diagnostic plots, which will start looking really strange.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k