Question: How can I edit the output from Cufflinks to do my own normalization?
5.1 years ago by
Hey everyone,

I am running an experiment with 4 samples paired-end among 2 conditions (Control vs Mutation) and 2 replicates of each one (C1, C2, MUT1, MUT2).

After mapping with segemehl, I build the transcripts with Cufflinks. So, at the end I have transcripts.gtf, genes.fpkm_tracking and  isoforms.fpkm_tracking. Now I have to pick the count (FPKM) of each gene and divide by a certain value corresponding the count of plasmid that was inserted in each sample and then proceed with the pipeline (cuffmerge and cuffdiff).

This values can be found in the table bellow.

Sample Value
C1 445.188/0.296
C2 137.217/0.196
MUT1 340.072/0.143
MUT2 643.493/0.271

But how can I do that? I already tried to edit the output from cufflinks and divide the counts of the 3 files, but when I merge the transcripts, all values disappear. I can't try after runs cuffmerge because the samples are merged and I can't discriminate the samples.

There's a way to do it?

cufflinks package also outputs estimated raw counts. You could use them to normalise again.

Yes, but how can I refeed the cufflinks/cuffdif with this information? My goal is find differential expressed HOX genes.
You don't. Cuffdiff is only designed to be used in a few predefined ways, of which what you're trying to do isn't one.

Ok, I will try to use DeSeq2 with the raw counts from cuffdiff, but the values are not integers. Deseq2 can accept this kind of values?

No, you'll either need to round them (not ideal) or instead use either edgeR or limma/voom.

either that or use something like htseq_count 

5.0 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Mucking around with data produced by one suite and putting it into the other unrelated one is a favorite past time of those that, as they say, "just want to use the tool everyone is using" - a hair raising example was someone telling me how they took FPKM values produced by Cuffdiff and wanted to use DESeq with it but because these values were too small and non integer they just ended  up multiplying everything by 1000 and then "DeSeq worked" ... (bioinformatics man, everything is possible, probably published as well)

My advice if you can't use the Cuffdiff pipeline use something else that takes into account your specifics, and don't try to make it work by rescaling after the fact etc. Your rescaling will very likely be all wrong.

