I have created genes count table from cufflinks workflow and I think I can use that result in DESeq package from Bioconductor. I see that the format of the table is quite similar and I tried to do that but I got an error. It seems that DESeq can not receive decimal values and only accept integer value. Do you have any suggestion for how to process the gene count table resulted from cuffnorm process? Or I made a mistake while trying to load the data to DESeq? Thank you.
Based on the fact that cuffnorm generates normalized expression values on the various tracked features (gene/transcript/TSS/CDS), it does not make sense to use the cuffnorm output as input for further analysis in DESeq. The cuffnorm manual entry specifically states:
Cuffnorm will report both FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group. Note that because these counts are already normalized to account for differences in library size, they should not be used with downstream differential expression tools that require raw counts as input.
The reason for this is, that cufflinks/cuffnorm use specific models for estimating transcript abundance, which are producing results that are not compatible with count-based types of analysis such as used in DESeq and edgeR. Both DESeq and edgeR require tables of raw counts in order to estimate the parameters for their models (dispersion).
The only way to run DESeq would be by creating count tables for each sample using e.g. htseq-count or methods from the GenomicRanges/IRanges packages, such as findOverlaps.