Question

Is there any way to normalize a sample with high number of reads assigned to it?

0

Entering edit mode

22 months ago

salman_96 ▴ 70

Hi,

I have a results from fastp in MultiQC report. I have almost a hundred paired end sample results.

When analyzing the filtered read results (mouse samples exposed to a drug), I can see that there is one sample that has way too high no of reads (sequencing depth) assigned to it. I did not work with the library prep steps but I came to know that this sample had some pooling issue when library was prepared. The sample is jBO3---

Can anyone please recommend or suggest a way to keep this sample by normalizing it? If so, how can I do it? Or should I just discard it?

I have added the picture below for fastp filtered reads as well as for RSEM mapped reads and STAR alignment scores.

fastp filter reads

rsem mapped reads

star alignment scores

fastp sequencing depth RSEM STAR • 672 views

ADD COMMENT • link updated 22 months ago by cpad0112 21k • written 22 months ago by salman_96 ▴ 70

1

Entering edit mode

You can try bbnorm (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbnorm-guide/) on that sample.

ADD REPLY • link 22 months ago by cpad0112 21k

score 3 · Answer 1 · 2022-06-10

Just the standard normalization for DESeq2 or edgeR. High readcount is not too much of a problem as at some point you probably reach a saturation on how many genes are detected and the counts just scale linear. It is more of a problem if a sample is undersequenced. I suggest to run e.g. the vst function from DESeq2 and then inspect data by PCA (see DESeq2 manual) and then see whether that outlier sample shows evidence in the PCA to be problematic. If not then just go on with analysis.