Question: How to know which norm method should be used for RNA-seq read counts?
gravatar for statfa
3.6 years ago by
statfa540 wrote:


I'm applying EBSeq-HMM on my time course read counts (4 time points). There are two possible normalization methods available in the package: "Median" and "Quantile".

This model clusters genes into their most likely path. What I see is that when I use Median normalization method on my data, Gene X's most likely path would be "Up-EE-Up" (EE stands for equally expressed). When I use Qunatile norm method, this gene's most likely path is "Up-Up-Up".

When I plotted the Median and Quantile normalized expression for this gene, I figured out that the slope of the gene expression between time point 2 and 3 in Median norm is less than Quantile. So probably that is why EBSeq-HMM didn't find the difference big enough to show an "Up" path for the genes. Now I don't know which norm method to trust or how to know which one is working better with EBSeq-HMM.

How I can I upload the photos?

rna-seq normalization • 1.6k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by statfa540

You can use either median or quantile, but what is your hypothesis statement? Choose the method that best helps to test your hypothesis.

ADD REPLYlink written 3.6 years ago by theobroma221.1k

There are no units on the Y-axis my friend. Big no no.

The expression of RNA-seq is usually normalised by fpkm or by TPM (better). Why are you using median or quantile? Unless Im mistaken, it sounds like these only take the raw expression counts. Median just means the middle count, while quantile normalising just means grouping the counts into binds, In which case you are not normalising for fragment length or library size, which would make your comparison meaningless. Do you have a way of finding out the exact equation being used in median and quantile?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by YaGalbi1.5k

I normalize the data to find the DE genes. genes are normalized by their library sizes using Median, Quantile, TMM, Total, etc. methods. Median and Quantile Normalization methods are available in EBSeq-HMM package. I normalized my raw read counts using those methods and compared the results. fpkm or rpkm are used to normalize the read counts for the read length which is not needed for DE analysis as I know. Is it correct?

Read this paper please:

"It is widely known that raw counts are not directly comparable between genes due to differential gene lengths and sequencing depths, and reads per kilobase per million reads (RPKM) can be used to correct the resultant technical bias [11]. In DE analysis between multiple conditions, the gene length does not affect the analysis result since such DE analysis focuses on the same gene. However, the condition comparison could greatly suffer from sample specific effects such as sequencing depth and sample specific GC-content effect. The sample specific GC-content effect could arise if two or more samples are sequenced in the same lane. Several within-lane normalization methods (i.e., regression normalization, global-scaling normalization, and full-quantile normalization) can be used to correct the resultant technical bias [12]. On the other hand, such effect can be absorbed into sample specific sequencing depth if only a single sample is sequenced in each lane, and the following four between-lane normalization methods are designed for correcting the technical bias due to sequencing depth: median normalization, total count normalization, quantile normalization, TMM normalization"

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by statfa540

Unit? read counts don't have a unit. do they?

ADD REPLYlink written 3.6 years ago by statfa540
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour