Question: RNAseq analysis: what comes first, filtering or normalization
gravatar for Herbert
13 months ago by
Herbert0 wrote:

Hi there

please excuse my very basic questions, but I was not able to find appropriate answers using searchengines.

I am trying to analyze a small dataset of the RNAseq of 3 vs 3 samples to identify differentially expressed genes and do some multivariate statistics. Due to the low sample size I chose to use EdgeR, but am a bit confused. In the package description ( all steps are nicely explained, but the order seems odd to me: they first describe filtering for low read counts, which in my samples removes quite a bit from the respective libraries, and then describe TMM normalization to account for the RNA composition effect.

Is this really the right order to do it, or am I confusing things?

So first:

data_edgeR <- DGEList(counts=data_matrix[2:46079,3:10], group=group) #create DGEList for further analyses

data_edgeR$samples #looking at library sizes before filtering

keep <- rowSums(cpm(data_edgeR)>1) >= 3
data_edgeR_filtered <- data_edgeR[keep, , keep.lib.sizes=FALSE]

and then

data_TMM_normalized <- calcNormFactors(data_edgeR_filtered)

Is this correct, or the other way ´round?

Many thanks!

rna-seq edger R • 746 views
ADD COMMENTlink modified 13 months ago by h.mon29k • written 13 months ago by Herbert0
gravatar for h.mon
13 months ago by
h.mon29k wrote:

Yes, it is the correct order. In general, the filtering removes quite a lot of genes, but a very small percentage of total counts - usually less than 1%. Did you compare total read count per sample pre- and pos-filtering?

ADD COMMENTlink written 13 months ago by h.mon29k

thx! Yes i checked, but found it difficult to estimate what is "much": From 7849976 to 7814960 for example

ADD REPLYlink written 13 months ago by Herbert0

In the example you gave you are keeping more than 99.5% of the original reads - this isn't "much" filtering by any means, and it is just as expected.

ADD REPLYlink written 13 months ago by h.mon29k

Perfect, thank you very much!

ADD REPLYlink written 13 months ago by Herbert0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1493 users visited in the last hour