Hi everyone, I have two questions regarding normalisation of RNAseq data.
1) My first question is general one. When dealing with two conditions (Lets say wildtype and knockout), do we need to normalize the counts by length of the feature/gene after correcting coverage bias (dividing by total no. of reads mapped for that particular sample)----referred as RPKM normalisation or just count per million (CPM) is sufficient ?
2) In NOIseq, R package, I tried normalizing the data both by CPM and RPKM (length correction using feature length). The CPM normalisation by NOIseq was found to be correctly normalized which I checked manually from the output. But the RPKM normalisation shows different values when I rechecked manually. I wonder if any one can help me with this or suggest something.
R script I used are as follows:
#Import counts mycounts=read.table("mycounts.txt", header = TRUE, stringsAsFactors = FALSE) #Import factor table myfactors = read.table("myfactors.txt", header=TRUE) #Import feature length mylength=read.table("mylength_sort.txt", header = TRUE, stringsAsFactors = FALSE) #Create NOIseq object mydata1 <- NOISeq::readData(data=mycounts, factors=myfactors, length = mylength) #Normalize (rpkm, lc=1) myRPKM = rpkm(assayData(mydata1)$exprs, long = mylength , k = 0, lc = 1)