Question: RNAseq analysis: what comes first, filtering or normalization
0
gravatar for Herbert
7 months ago by
Herbert0
Herbert0 wrote:

Hi there

please excuse my very basic questions, but I was not able to find appropriate answers using searchengines.

I am trying to analyze a small dataset of the RNAseq of 3 vs 3 samples to identify differentially expressed genes and do some multivariate statistics. Due to the low sample size I chose to use EdgeR, but am a bit confused. In the package description (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) all steps are nicely explained, but the order seems odd to me: they first describe filtering for low read counts, which in my samples removes quite a bit from the respective libraries, and then describe TMM normalization to account for the RNA composition effect.

Is this really the right order to do it, or am I confusing things?

So first:

data_edgeR <- DGEList(counts=data_matrix[2:46079,3:10], group=group) #create DGEList for further analyses

data_edgeR$samples #looking at library sizes before filtering

keep <- rowSums(cpm(data_edgeR)>1) >= 3
data_edgeR_filtered <- data_edgeR[keep, , keep.lib.sizes=FALSE]

and then

data_TMM_normalized <- calcNormFactors(data_edgeR_filtered)

Is this correct, or the other way ´round?

Many thanks!

rna-seq edger R • 483 views
ADD COMMENTlink modified 7 months ago by h.mon27k • written 7 months ago by Herbert0
4
gravatar for h.mon
7 months ago by
h.mon27k
Brazil
h.mon27k wrote:

Yes, it is the correct order. In general, the filtering removes quite a lot of genes, but a very small percentage of total counts - usually less than 1%. Did you compare total read count per sample pre- and pos-filtering?

ADD COMMENTlink written 7 months ago by h.mon27k

thx! Yes i checked, but found it difficult to estimate what is "much": From 7849976 to 7814960 for example

ADD REPLYlink written 7 months ago by Herbert0

In the example you gave you are keeping more than 99.5% of the original reads - this isn't "much" filtering by any means, and it is just as expected.

ADD REPLYlink written 7 months ago by h.mon27k

Perfect, thank you very much!

ADD REPLYlink written 7 months ago by Herbert0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1858 users visited in the last hour