Entering edit mode
5.0 years ago
Rajinder Gupta
▴
10
I want to get the expression data that is in-sample normalized like FPKM and also across samples normalized as obtained using DESeq2 or else.
What I am currently doing is that I first normalize the data across samples (using DESeq) and from the resultant expression I calculate the FPKM. Does it make sense or am I missing something here?
The data have already been scaled for depth after the DESeq2 normalization, so FPKM is not suitable. Why do you want to do that?
With DESeq the data is normalized across samples but I also want to compare different transcripts within the same sample which is not possible with DESeq normalization because of the different sizes of the transcripts. With FPKM this can be acheived but then how do I address the differences in sequencing depth across different samples.
I think as these are two different analysis goals, use TPM to compare your transcripts within the sample and use DESeq2 to compare samples. They are not easily interchangable. Is there an argument from your side not to do both separately?
Unfortunately yes. I am developing a pipeline for analyses in which I have to use the results from within sample comparison for the across sample comparison
That is not a good idea. Please use google and the search function on why FPKM/TPM perform poorly for inter-sample comparison. There is a lot of literature out there on this plus the authors of the established differential analysis tools recommend explicitely against doing that. If you browse Bioconductor support page a bit you'll see why.
I understand the limitation of use of FPKM for inter-sample comparison i.e. I am thinking of calculating the FPKM from the normalized read counts. What I am proposing is that the samples are first normalized using DESeq, edgeR or else and then from these normalized counts FPKM is calculated. So this is not the direct FPKM but an intersample normalized FPKM
Hi, Very late reply, but I only just came across this as well. NOIseq allows you to do TMM normalization (edgeR) and also account for gene lengths, therefore I believe this will give you between-sample and within-sample normalization.
Please correct me if wrong, as I am not an expert.
You could simply divide the counts by gene length, I think that is not the general issue, but you decrease the counts and therefore lose power. Don't see the advantage.
Ok, so bottom line, keep the within- and between-sample normalization separate. Thanks!
You can also just normalize TPMs using DESeq size factors. I don't see anything wrong with doing so.