Question: cufflinks and average FPKM
1
gravatar for aleka
3.6 years ago by
aleka90
United Kingdom
aleka90 wrote:

Hi. I have three samples. I used the new cufflinks pipeline to run the analysis and in the end I get the normalized fpkm values per gene using cuffnorm. However, I would like to get an average value for each gene as the three samples are from the same strain. Is it fine if I just average across the 3? or is there any way within cufflinks to get the average fpkm value? 

 

sequencing rna-seq alignment • 2.3k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by aleka90
1
gravatar for Sean Davis
3.6 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

If you really believe that these are all identical, then you might simply pool the FASTQ files and map everything together.

ADD COMMENTlink written 3.6 years ago by Sean Davis25k

according to the fpkm values that I get for each of the 3 samples, I have very similar values for the each gene across the 3 samples.

I was thinking either average the fpkm values of the 3 samples for each gene or perform the cuffquant based on the merged gtf file and the merged bam files (of the 3 samples), so I get a single fpkm value per gene. Which of the two do you think it is better?

The goal is to compare the genes across different strains, so I would like to get an average value for each gene and each strain.

ADD REPLYlink written 3.6 years ago by aleka90

If you want to compare strains, then averaging is not what you want to do.  You'll want to keep your replicates and run cuffdiff on the replicates.  

ADD REPLYlink written 3.6 years ago by Sean Davis25k

cuffdiff gives pairwise comparison among the samples and I want to compare 4 different strains at the same time (not 2 at a time). 

what if I merge the samples of each strain, run cuffquant with the merged.gtf and once I get the abundances.cxb for all the different strains, I run cuffnorm? in that case I could merge the transcripts.gft files of the different strains. Does that sound reasonable? 

Or I could average the fpkm values (from cuffnorm) for the 3 replicates in the same strain. I think it should be fine to average as the 3 replicates are exactly the same experiment with the same library size that was made 3 times. As well cuffnorm mentions that fpkm values are comparable between samples. what do you think?

to compare among strains, I will do an extra normalization. I want to get an approximate value per gene per strain. Which of the two ways that I mentioned fit the best? or do you have something to suggest that will allow me to compare the 4 strains all together at once? 

ADD REPLYlink written 3.6 years ago by aleka90

What do you mean by "compare", statistically significant genes, or just a graphical representation via PCA or clustering?

ADD REPLYlink written 3.6 years ago by Sean Davis25k
0
gravatar for aleka
3.6 years ago by
aleka90
United Kingdom
aleka90 wrote:

according to the fpkm values that I get for each of the 3 samples, I have very similar values for the each gene across the 3 samples.

I was thinking either average the fpkm values of the 3 samples for each gene or perform the cuffquant based on the merged gtf file and the merged bam files (of the 3 samples), so I get a single fpkm value per gene. Which of the two do you think it is better?

The goal is to compare the genes across different strains, so I would like to get an average value for each gene and each strain.

ADD COMMENTlink written 3.6 years ago by aleka90

Hi,

I don't think that averaging fpkm values makes much sense as raw fpkm shouldn't even be compared between libraries (because it is a "within sample normalized" expression value). Merging bam files makes more sense to me, although I'm not sure of the signification of the fpkm values you'll get from the merged bam file...

ADD REPLYlink written 3.6 years ago by Carlo Yague4.5k

to compare across the different strains, I will perform an extra normalization. 

My question is whether averaging the fpkm values from 3 different samples from the same strain, after they have been normalized with cuffnorm, sounds normal and reasonable.

What I want is to get an average value per gene per strain. is there any other way to do it?

ADD REPLYlink written 3.6 years ago by aleka90

If you really wants to have an "average fpkm value", then I don't have arguments to say "it is wrong!" but I don't think it is normal practice. Anyway, I guess you can try that... But don't use this "average fpkm value" for differential expression analysis. If you have replicates, its better to use all the information (not just the mean) to call for DEG.

ADD REPLYlink written 3.6 years ago by Carlo Yague4.5k

ok I see that it is not the best way. 

But is there any other way to compare the expression of multiples samples all together (not pairwise comparison)?

ADD REPLYlink written 3.6 years ago by aleka90

Comparison is always pairwise in DE analysis, you compare the expression in one sample (or one group) against one other, usually a wild type or control of some sort.

Perhaps in your case you don't have a real control strain and this is why you want to compare all the strains together ? Then you could perhaps compare each strain vs all the others (this is still pairwise comparison).

Now if you want to compare the samples more globally (not at the gene level), you could try PCA analysis, hierarchical clustering or simple correlation. These types of analysis allows to compare multiple samples at the same time, however it won't tell you what genes are differentially expressed.

ADD REPLYlink written 3.6 years ago by Carlo Yague4.5k

You need to define "compare".  Are you talking about finding differentially-expressed genes (p-values), or simply making a plot?  

ADD REPLYlink written 3.6 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1236 users visited in the last hour