RNAseq analysis: what comes first, filtering or normalization
1
1
Entering edit mode
5.3 years ago
Herbert ▴ 10

Hi there

please excuse my very basic questions, but I was not able to find appropriate answers using searchengines.

I am trying to analyze a small dataset of the RNAseq of 3 vs 3 samples to identify differentially expressed genes and do some multivariate statistics. Due to the low sample size I chose to use EdgeR, but am a bit confused. In the package description (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) all steps are nicely explained, but the order seems odd to me: they first describe filtering for low read counts, which in my samples removes quite a bit from the respective libraries, and then describe TMM normalization to account for the RNA composition effect.

Is this really the right order to do it, or am I confusing things?

So first:

data_edgeR <- DGEList(counts=data_matrix[2:46079,3:10], group=group) #create DGEList for further analyses

data_edgeR$samples #looking at library sizes before filtering

keep <- rowSums(cpm(data_edgeR)>1) >= 3
data_edgeR_filtered <- data_edgeR[keep, , keep.lib.sizes=FALSE]

and then

data_TMM_normalized <- calcNormFactors(data_edgeR_filtered)

Is this correct, or the other way ´round?

Many thanks!

R edgeR RNA-Seq • 3.7k views
ADD COMMENT
4
Entering edit mode
5.3 years ago
h.mon 35k

Yes, it is the correct order. In general, the filtering removes quite a lot of genes, but a very small percentage of total counts - usually less than 1%. Did you compare total read count per sample pre- and pos-filtering?

ADD COMMENT
0
Entering edit mode

thx! Yes i checked, but found it difficult to estimate what is "much": From 7849976 to 7814960 for example

ADD REPLY
0
Entering edit mode

In the example you gave you are keeping more than 99.5% of the original reads - this isn't "much" filtering by any means, and it is just as expected.

ADD REPLY
0
Entering edit mode

Perfect, thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6