From cuffnorm to DESeq
1
0
Entering edit mode
6.6 years ago
bharata1803 ▴ 530

Hello,

I have created genes count table from cufflinks workflow and I think I can use that result in DESeq package from Bioconductor. I see that the format of the table is quite similar and I tried to do that but I got an error. It seems that DESeq can not receive decimal values and only accept integer value. Do you have any suggestion for how to process the gene count table resulted from cuffnorm process? Or I made a mistake while trying to load the data to DESeq? Thank you.

Bioconductor cufflinks RNA-Seq • 5.2k views
1
Entering edit mode
6.6 years ago

Based on the fact that cuffnorm generates normalized expression values on the various tracked features (gene/transcript/TSS/CDS), it does not make sense to use the cuffnorm output as input for further analysis in DESeq. The cuffnorm manual entry specifically states:

Cuffnorm will report both FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group. Note that because these counts are already normalized to account for differences in library size, they should not be used with downstream differential expression tools that require raw counts as input.

The reason for this is, that cufflinks/cuffnorm use specific models for estimating transcript abundance, which are producing results that are not compatible with count-based types of analysis such as used in DESeq and edgeR. Both DESeq and edgeR require tables of raw counts in order to estimate the parameters for their models (dispersion).

The only way to run DESeq would be by creating count tables for each sample using e.g. htseq-count or methods from the GenomicRanges/IRanges packages, such as findOverlaps.

A good methods overview paper was recently published here: http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html [a bit silly though putting a paper on Open Source software behind a paywall...]

Hope this helps!

0
Entering edit mode

Thank you. I already consider to redo all of the work using htseq-count but I read in some other post that suggest use limma/voom method. Do you familiar with this limma/voom method?

0
Entering edit mode

I have not used the voom/limma method, but I believe that you again need raw count tables prior to transformation. Have a look in the limma user guide, I think there is a whole chapter on it.