Question: dealing with Cuffdiff2 outputs: adding pseudocounts or using cutoff values?
0
gravatar for a_j_d_a_n
3.1 years ago by
a_j_d_a_n0
a_j_d_a_n0 wrote:

Dear experts,

although (or exactly because of that) I read a lot of manuals and papers dealing with RNA-Seq for the analysis of differential expressed genes, I´m unfortunately don´t really sure about the pipeline/parameter I should use. So, I really hope that some of you can give me any advice.

My experimental setup in short: 2 WT samples and 2 patient samples (same family, same unknown mutation), 150 bp paired-end. I´m searching for genes that are differentially expressed. Especially, I´m searching for a gene which is not expressed in the patients, but in the WT (perhaps because of a deletion of an exon or suchlike). I merged the WT and patient samples and analyze the files with the help of TopHat2 and Cuffdiff2. As I learned in some forum discussions, I can´t use any statistics for filtering, because I don´t have biological replicates. So I´ve just calculated the TPMs and ranked them regarding their log2fold changes. But I have different problems: many of the genes have in one condition FPKM/TPM = 0 and I don´t know, how to proceed with them. Should I add a pseudocount? Or should I eliminate them? But regarding my questioning I ´don´t want to cut them off actually. Can anyone help me with this topic?

Thanks a lot in advance!!

rna-seq • 924 views
ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 3.1 years ago by a_j_d_a_n0
0
gravatar for agata88
3.1 years ago by
agata88790
Poland
agata88790 wrote:

In my opinion if FPKM=0 it means that no reads were aligned to that gene. Because it is RNAseq I would suggest that it happens when there is no expression of that gene. This information is valuable. If you see the FPKM for the same gene in other sample you can say that the expression occur in sample 1 and not in sample 2. The log2foldchange for FPKM=0 is inf/-inf. If you have cuffdiff output you can see that some of genes have different FPKM (which means that one may be overexpressed), some have FPKM = 0. In addition there is an information about the probability and statistic test score. There is NOTEST - if both samples have coverage for gene 0, and OK if even one gene has FPKM different than 0.

Hope it helps,

Best, Agata

ADD COMMENTlink written 3.1 years ago by agata88790
0
gravatar for a_j_d_a_n
3.1 years ago by
a_j_d_a_n0
a_j_d_a_n0 wrote:

Thanks for your answer, Agata. The genes which have FPKM=0 in both samples, I´ll eliminate anyway, because they don´t tell anything about differential expression. But the log2fc=+/-inf are nevertheless a problem, because it is not really necessary to show these values within a heatmap. For this reason, I thought about adding pseudocounts?

ADD COMMENTlink written 3.1 years ago by a_j_d_a_n0

Did you try to make a heatmap by cummeRbund library in R? This library is prepared for reading the diff_out files from cuffdiff program.

Here is a link to tutorial : http://compbio.mit.edu/cummeRbund/manual_2_0.html Best, Agata

ADD REPLYlink written 3.1 years ago by agata88790

I don't think I understand correctly ... how showing the differential expression (log2fc=+/-inf ) may be not necessary in heatmap? What would you like to have on this heatmap?

ADD REPLYlink written 3.1 years ago by agata88790
0
gravatar for a_j_d_a_n
3.1 years ago by
a_j_d_a_n0
a_j_d_a_n0 wrote:

Oh sorry, a little typing error: necessary = possible. But I think, I try to make the heatmap with the ranked TPM/FPKM. So the log2fc=+/- inf will not be a problem

ADD COMMENTlink written 3.1 years ago by a_j_d_a_n0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1194 users visited in the last hour