I have been working with
EdgeR and its documentation to perform analyses on RNAseq data. I, however, find it difficult to understand the 'input' that should go into several of the EdgeR funcitons.
Starting page 10 of the documentation, the example shows
DGEList making object
DGEList object) and then using this object in subsequent analyses. See below:
group <- c(1,1,2,2) y <- DGEList(counts=x, group=group)
The documentation then goes into filtering (section 2.7) via the following commands:
keep <- filterByExpr(y, group=group) y <- y[keep, , keep.lib.sizes=FALSE]
Section 2.8.3 (p. 15) also shows normalization of the library sizes with the follwing:
y <- calcNormFactors(y)
Question 1: Should I always perform this process of filtering and calculating the norm factors before using the
DGEList object in any analyses? In other words, there is no need for me to make a
y_copy object of the pre-filtered data, correct?
These are the functions I am interested in:
estimateDisp- to estimate dispersion (2.11.2 - pg. 21)
exactTest- differentially expressed genes/tags between 2+ groups (2.10.2 - pg. 20)
glmQLFit- for quasi-likelihood F-tests (2.11.3 - pg. 22)
Question 2: For the above functions: should they all receive the same input object (e.g. the same copy of the object) or should they receive distinct but identical objects (e.g. after
calcNormFactors you would have the following code):
y_two <- y y_three <- y
and then use the different
y objects for each of the above functions?
I originally tried with the former approach (same copy of the object) but, after all of genes returned as positive for sufficiently high L2F change and sufficiently low p-values, figured that perhaps they should be considered independently. I, however, wanted to double-check that my deduction was correct in case any of the three above analyses were somehow connected/dependent upon one another.