Cuffnorm: which output file should I use
2
0
Entering edit mode
4.8 years ago

Hello Biostars,

I'm trying to use cuffnorm to normalize the cuffquant results. I'm getting these output files:

• cds.count_tracking
• genes.fpkm_tracking
• isoforms.count_tracking
• run.info
• tss_groups.fpkm_tracking
• cds.fpkm_tracking
• genes.count_tracking
• isoforms.fpkm_tracking
• tss_groups.count_tracking

I'm mostly interested in genes, so which file should I use? is it the genes.count_tracking ? Is that the normalized gene counts?

Thanks

cufflinks cuffnorm next-gen RNA-Seq • 2.7k views
0
Entering edit mode

What is your question, i.e. what will you do for the next step? Is it differential expression of the genes, or?

0
Entering edit mode

Not the DE analysis, my goal is to predict a clinical end-point, such as survival using the normalized expression levels.

0
Entering edit mode
4.8 years ago
Prakash ★ 2.1k

genes.fpkm_tracking is the file for genes with its normalized FPKM (fragment per kb per millions of mapped reads)

0
Entering edit mode

You mean they are not the normalized values by cuffnorm? I'm confused.

0
Entering edit mode
4.8 years ago
aka001 ▴ 190

Although I would presume that you did cuffnorm because you wanted to normalise your data, I would still just want to say that it is important to see what are your biological questions. "Mostly interested in genes" can mean many things and if you intend to do differential expression of the genes (with the mainstream packages, like DESeq or edgeR) for the next step, then you would actually want to take the genes.count_tracking file, as it is not normalised and hence it would suit for DE purpose.

0
Entering edit mode

I'm not doing DE analysis. my goal is to predict a clinical end-point, such as survival using the normalized expression levels. If genes.count is not the normalized values then which file have the normalized expression levels according to cuffnorm?

0
Entering edit mode

The normalised expression should be genes.fpkm_tracking.

0
Entering edit mode

Thanks, but my understanding was that fpkm is very simple to because it is just done by a few multiplicaitons and divisions. Which should not take hours to be completed by cuffnorm, what am I missing then? Thanks again

0
Entering edit mode

Well, if we go back again to what you wanted in the first place, which is normalisation, it would depend of what kind of normalisation did you want. By definition, fpkm is already normalised value, but it's normalised by library size (i.e. within sample). For the next step, probably you would want to normalise across samples (with methods like RLE, TMM, etc., which I am not sure cuffnorm is doing that) from the fpkm values (or TPM if you want) from the cuffnorm output files. For the speed, I would just guess that cuffnorm is not only calculating that, but much more other things (as reflected by the number of files generated).

0
Entering edit mode

Thank you so much, this was helpful. My understanding from what you said is that the gene_tracking is not normalized (with respect to the library size), but when I check the raw gene counts vs the gene_count_tracking, the latter has been scaled down, which means it is doing some normalization as well. Am I missing something?