Cuffdiff results transcript testing
2
1
Entering edit mode
8.7 years ago
manekineko ▴ 150

I'm exploring the results from Cuffdiff:

  1. .gene_differential_expression_testing
  2. .transcript_differential_expression_testing

On some genes I see multiple entries in the 2) and with opposite log2 direction as:

TMEM51    chr1:15479027-15546974    A    B    OK    0.000151584    0.563561    11.8602
TMEM51    chr1:15479027-15546974    A    B    OK    0.741354       3.92E-05    -14.2062
TMEM51    chr1:15479027-15546974    A    B    OK    2.39194        0.460979    -2.37541

Moreover the gene TMEM51 is missing in 1)......is that normal?

In the 2) are these multiple gene names are different isoforms, if yes how to know if the isoform is new or known one?

cuffdiff RNA-seq • 2.2k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

On your transcripts file, everything should have a TCONS ID, if there are two entries with the same gene name, or gene ID, then chances are they'll have a different TCONS ID, implying they're different isoforms of the gene.

If you're seeing the same gene name in the Genes differential expression file from cuffdiff, then that's more concerning, as everything should be collapsed down to an XLOC ID, or essentially a locus that encapsulates the gene.

If you want to know if it's 'novel' under the tuxedo method of detection, you should look at the class code that it's been given, better yet, load up some of the alignments in IGV and judge for yourself. In my experience, I'd take what cufflinks is calling 'novel' with a bucket of salt.

ADD COMMENT
0
Entering edit mode
8.7 years ago
manekineko ▴ 150

Thanks, yes they have different TCONS ID, but is it biologicaly relevant - can these 2 or 3 isoforms that are detected can be expressed at the same time from one gene?
And another question which file you usually explore and get for a further analyses in general the Gene one or the one with the transcripts?

ADD COMMENT
1
Entering edit mode

to be honest, I don't use the tuxedo pipeline if I can avoid it. If I'm looking for gene expression, I'd use htSeq_Count -> DESeq2, Transcript expression I'd use Salmon/Kallisto and EBSeq (Though EBSeq is pretty restrictive on the model you can provide, but then again so is Tuxedo).

If they're known isoforms, then biologically they can be absolutely relevant, isoform switching can certainly happen. When you get an RNA seq sample, you're looking at a snapshot of what's happening biologically at a given moment, which means isoform switching could have occurred at different times, but is still captured in the sample.

It depends what you want to look at really. I disagree with Tuxedo's gene level methodology by using 'windows' to collapse everything down, and it's very difficult to understand how it calls 'novel' transcripts. If I was forced to use Tuxedo, I'd probably do a run looking for all known features and use the transcripts file to identify regions of interest (isoform switching events), and use the gene level stuff to make sure everything was as I expected it to be.

ADD REPLY

Login before adding your answer.

Traffic: 2670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6