cufflinks and average FPKM
2
1
Entering edit mode
8.4 years ago
aleka ▴ 110

Hi. I have three samples. I used the new cufflinks pipeline to run the analysis and in the end I get the normalized fpkm values per gene using cuffnorm. However, I would like to get an average value for each gene as the three samples are from the same strain. Is it fine if I just average across the 3? or is there any way within cufflinks to get the average fpkm value?

RNA-Seq sequencing alignment • 4.3k views
ADD COMMENT
1
Entering edit mode
8.4 years ago

If you really believe that these are all identical, then you might simply pool the FASTQ files and map everything together.

ADD COMMENT
0
Entering edit mode

according to the fpkm values that I get for each of the 3 samples, I have very similar values for the each gene across the 3 samples.

I was thinking either average the fpkm values of the 3 samples for each gene or perform the cuffquant based on the merged gtf file and the merged bam files (of the 3 samples), so I get a single fpkm value per gene. Which of the two do you think it is better?

The goal is to compare the genes across different strains, so I would like to get an average value for each gene and each strain.

ADD REPLY
0
Entering edit mode

If you want to compare strains, then averaging is not what you want to do. You'll want to keep your replicates and run cuffdiff on the replicates.

ADD REPLY
0
Entering edit mode

cuffdiff gives pairwise comparison among the samples and I want to compare 4 different strains at the same time (not 2 at a time).

what if I merge the samples of each strain, run cuffquant with the merged.gtf and once I get the abundances.cxb for all the different strains, I run cuffnorm? in that case I could merge the transcripts.gft files of the different strains. Does that sound reasonable?

Or I could average the fpkm values (from cuffnorm) for the 3 replicates in the same strain. I think it should be fine to average as the 3 replicates are exactly the same experiment with the same library size that was made 3 times. As well cuffnorm mentions that fpkm values are comparable between samples. what do you think?

to compare among strains, I will do an extra normalization. I want to get an approximate value per gene per strain. Which of the two ways that I mentioned fit the best? or do you have something to suggest that will allow me to compare the 4 strains all together at once?

ADD REPLY
0
Entering edit mode

What do you mean by "compare", statistically significant genes, or just a graphical representation via PCA or clustering?

ADD REPLY
0
Entering edit mode
8.4 years ago
aleka ▴ 110

according to the fpkm values that I get for each of the 3 samples, I have very similar values for the each gene across the 3 samples.

I was thinking either average the fpkm values of the 3 samples for each gene or perform the cuffquant based on the merged gtf file and the merged bam files (of the 3 samples), so I get a single fpkm value per gene. Which of the two do you think it is better?

The goal is to compare the genes across different strains, so I would like to get an average value for each gene and each strain.

ADD COMMENT
0
Entering edit mode

Hi,

I don't think that averaging fpkm values makes much sense as raw fpkm shouldn't even be compared between libraries (because it is a "within sample normalized" expression value). Merging bam files makes more sense to me, although I'm not sure of the signification of the fpkm values you'll get from the merged bam file...

ADD REPLY
0
Entering edit mode

to compare across the different strains, I will perform an extra normalization.

My question is whether averaging the fpkm values from 3 different samples from the same strain, after they have been normalized with cuffnorm, sounds normal and reasonable.

What I want is to get an average value per gene per strain. is there any other way to do it?

ADD REPLY
0
Entering edit mode

If you really wants to have an "average fpkm value", then I don't have arguments to say "it is wrong!" but I don't think it is normal practice. Anyway, I guess you can try that... But don't use this "average fpkm value" for differential expression analysis. If you have replicates, its better to use all the information (not just the mean) to call for DEG.

ADD REPLY
0
Entering edit mode

ok I see that it is not the best way.

But is there any other way to compare the expression of multiples samples all together (not pairwise comparison)?

ADD REPLY
0
Entering edit mode

Comparison is always pairwise in DE analysis, you compare the expression in one sample (or one group) against one other, usually a wild type or control of some sort.

Perhaps in your case you don't have a real control strain and this is why you want to compare all the strains together ? Then you could perhaps compare each strain vs all the others (this is still pairwise comparison).

Now if you want to compare the samples more globally (not at the gene level), you could try PCA analysis, hierarchical clustering or simple correlation. These types of analysis allows to compare multiple samples at the same time, however it won't tell you what genes are differentially expressed.

ADD REPLY
0
Entering edit mode

You need to define "compare". Are you talking about finding differentially-expressed genes (p-values), or simply making a plot?

ADD REPLY

Login before adding your answer.

Traffic: 1734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6