Question: How can I edit the output from Cufflinks to do my own normalization?
gravatar for rodolpho.gheleri
5.1 years ago by
rodolpho.gheleri50 wrote:

Hey everyone,

I am running an experiment with 4 samples paired-end among 2 conditions (Control vs Mutation) and 2 replicates of each one (C1, C2, MUT1, MUT2).

After mapping with segemehl, I build the transcripts with Cufflinks. So, at the end I have transcripts.gtf, genes.fpkm_tracking and  isoforms.fpkm_tracking. Now I have to pick the count (FPKM) of each gene and divide by a certain value corresponding the count of plasmid that was inserted in each sample and then proceed with the pipeline (cuffmerge and cuffdiff).

This values can be found in the table bellow.

Sample Value
C1 445.188/0.296
C2 137.217/0.196
MUT1 340.072/0.143
MUT2 643.493/0.271

But how can I do that? I already tried to edit the output from cufflinks and divide the counts of the 3 files, but when I merge the transcripts, all values disappear. I can't try after runs cuffmerge because the samples are merged and I can't discriminate the samples.

There's a way to do it?

ADD COMMENTlink modified 3.5 years ago by Biostar ♦♦ 20 • written 5.1 years ago by rodolpho.gheleri50

cufflinks package also outputs estimated raw counts. You could use them to normalise again.

ADD REPLYlink written 5.1 years ago by geek_y11k
Yes, but how can I refeed the cufflinks/cuffdif with this information? My goal is find differential expressed HOX genes.
ADD REPLYlink written 5.1 years ago by rodolpho.gheleri50

You don't. Cuffdiff is only designed to be used in a few predefined ways, of which what you're trying to do isn't one.

ADD REPLYlink written 5.1 years ago by Devon Ryan96k

Ok, I will try to use DeSeq2 with the raw counts from cuffdiff, but the values are not integers. Deseq2 can accept this kind of values?

ADD REPLYlink written 5.1 years ago by rodolpho.gheleri50

No, you'll either need to round them (not ideal) or instead use either edgeR or limma/voom.

ADD REPLYlink written 5.1 years ago by Devon Ryan96k

either that or use something like htseq_count 

ADD REPLYlink written 5.0 years ago by andrew.j.skelton736.0k
gravatar for Istvan Albert
5.0 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

Mucking around with data produced by one suite and putting it into the other unrelated one is a favorite past time of those that, as they say, "just want to use the tool everyone is using" - a hair raising example was someone telling me how they took FPKM values produced by Cuffdiff and wanted to use DESeq with it but because these values were too small and non integer they just ended  up multiplying everything by 1000 and then "DeSeq worked" ... (bioinformatics man, everything is possible, probably published as well)

My advice if you can't use the Cuffdiff pipeline use something else that takes into account your specifics, and don't try to make it work by rescaling after the fact etc. Your rescaling will very likely be all wrong.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Istvan Albert ♦♦ 85k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour