Cuffdiff Output Interpretation
1
1
Entering edit mode
7.2 years ago

I generated diff files following the differential-gene though I don't understand the biological meaning of the output file and what can I do with it. I've installed cummerbund and read the manual of cummerbund and cuffdiff but I want to know the biological meaning of the diff file. For example, does it mean that we have more transcripts with higher FPKMx, or, a negative log2 means downregulated meanwhile positive log2 means upregulated? Can anyone explain to me the biological meaning of columns 7 to 13?*

** Here is what I found from cuffdiff manual:

Column number    Column name       Example           Description
1                Tested id       XLOC_000001       A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested
2                gene           Lypla1           The gene_name(s) or gene_id(s) being tested
3                locus           chr1:4797771-4835363    Genomic coordinates for easy browsing to the genes or transcripts being tested.
4                sample 1       Liver           Label (or number if no labels provided) of the first sample being tested
5                sample 2       Brain           Label (or number if no labels provided) of the second sample being tested
6                Test status       NOTEST           Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing.
7                FPKMx           8.01089           FPKM of the gene in sample x
8                FPKMy           8.551545           FPKM of the gene in sample y
9                log2(FPKMy/FPKMx)    0.06531       The (base 2) log of the fold change y/x
10                test stat       0.860902           The value of the test statistic used to compute significance of the observed change in FPKM
11                p value           0.389292           The uncorrected p-value of the test statistic
12                q value           0.985216           The FDR-adjusted p-value of the test statistic
13                significant       no               Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing

cuffdiff differential-expression • 13k views
2
Entering edit mode
7.2 years ago
seidel 7.6k

Can anyone explain to me the meaning of columns 7 to 13?

Look in your description field for those columns, and perhaps modify your question to be more specific about what it is you don't understand. The FPKM value represents the concentration of a transcript in your samples, normalized for observed read counts and gene length. Thus fields 7,8 represent measurements for your samples and field 9 is simply a ratio of the two. You might look up FPKM or RPKM values if you're unsure what they represent. Fields 11 and 12 are p-value and q-value. These are values associated with the measured variation or uncertainty when you make repeated measurements of something. You should look up what a p-value and an "adjusted p-value" are (the adjusted one is important for you to understand if you're going to do any genomic data analysis). The 13th field is simply a flag based on whether the value in field 11 or 12 is less than 0.05 (I forget which one, but you could figure it out by exploring your data).

I'm not sure what I should look for next, any idea?

Your question sort of implies that you don't understand anything about measuring gene expression with NGS, or anything about how to analyze data with repeated measurements. Thus I would recommend looking up the concepts and terms described in your "description" section. That would be a start. There are also a variety of review articles to be found in pubmed and in various posts here about getting started in NGS. Sorry if I sound harsh, but your question implies that you've done basically no searching for answers despite a wealth of avenues sitting right there in your own description field :)

0
Entering edit mode

Thank you for answering, though I've spend days on it reading stuff that explains everything up to cuffdiff. I can't seem to find any good paper or website/tutorials explaining what to do after cuffdiff and how to interpret the output file of cuffdiff. I'm a whole genome specialist and never did any RNAseq work before so this is pretty new to me.

Anyhow, my main question is really to understand what does it mean biologically, for example, to have such p-value or q-value? I insist on the "biologically".

Thanks--