Question

Inconsistency Between Cufflinks And Cuffdiff

0

Entering edit mode

6.9 years ago

vm.higareda ▴ 30

I am very confused about results of cufflinks-cuffdiff and only cuffdiff. When I use the first approach (cufflinks-cuffdiff) to detect diferrencial expresion in a data set of Drosophila, the diferential expresed genes are differents that when I use only cuffdiff.

even some genes are not TESTED using one approach or the other. This is very worrying since biological interpretation will depend of the us approach

Example (cufflinks-cuffdiff)

XLOC_008606 CecA1 NT_033777.3:30210873-30211273 sin_spiro con_spiro OK 3.65287 51.0277 3.80418 2.83677 5.00E-05 0.0155562 yes

XLOC_008607 CecA2 NT_033777.3:30212155-30212568 sin_spiro con_spiro OK 2.47732 32.7192 3.72329 2.62197 0.0001 0.0284457 yes

XLOC_009396 Cpr47Eg NT_033778.4:11277946-11278762 sin_spiro con_spiro OK 14.2928 113.849 2.99375 2.89525 5.00E-05 0.0155562 yes

(cuffdiff)

gene17163 gene17163 CecA1 NT_033777.3:30210873-30211273 sin_spiro con_spiro NOTEST 0 0 0 0 1 1 no

gene17165 gene17165 CecA2 NT_033777.3:30212155-30212568 sin_spiro con_spiro NOTEST 0 0 0 0 1 1 no

gene7138 gene7138 Cpr47Eg NT_033778.4:11277956-11278445 sin_spiro con_spiro NOTEST 0 0 0 0 1 1 no

RNA-Seq cufflinks rna cuffdiff • 1.4k views

ADD COMMENT • link updated 6.9 years ago by Istvan Albert 100k • written 6.9 years ago by vm.higareda ▴ 30

score 0 · Answer 1 · 2017-05-26

0

Entering edit mode

6.9 years ago

Istvan Albert 100k

The two operate a little differently when you use cufflinks you also assemble new transcripts and the quantification will include those as well.

In this case, your data might contains read that while fall within a gene start/end do not match known annotation information - hence when using cuffdiff alone nothing matches over a transcript (hence the NOTEST output). When you assemble new transcripts it produces a new "annotation" over the same region that, in turn, can be tested.

ADD COMMENT • link 6.9 years ago by Istvan Albert 100k

0

Entering edit mode

But, for example in the case of CecA1 they even have te same positions using cufflinks-cuffdiff that only cuffdiff ( NT_033777.3:30210873-30211273), I would expect for CecA1 at least similar numbers of mapped sequences.

Thank you

ADD REPLY • link 6.9 years ago by vm.higareda ▴ 30

0

Entering edit mode

We have to keep the coordinates for genes separate from the coordinates of transcripts. The only statement that we can infer is that the transcript comes from within the gene coordinates. But the gene coordinates tell us very little about the transcript coordinates.

If most reads map to a region that is not annotated as an exon in the original GFF then those won't be counted (at least shouldn't be counted) at gene level with cuffdiff. But once you do annotate them with cufflinks then, the same cuffdiff process will count them differently.

Use IGV and line up the transcript coordinates for the Cufflinks and the original GFF files and there is a good chance the former has regions that are missing from the latter that would explain the difference.

ADD REPLY • link 6.9 years ago by Istvan Albert 100k

0

Entering edit mode

Thank you for your response, so in your experience Do you think that is better use cufflinks- cuffdiff than only cuffdiff?. In the beginning I decided use cuffdiff since I am working with Drosophila and it seem to be a well-annotated organism.

ADD REPLY • link 6.9 years ago by vm.higareda ▴ 30

0

Entering edit mode

The choice needs to be made based on the phenomena under study - for example, once you establish that there is indeed a novel transcript variant then it makes sense to use that.

Though clearly when doing so there will be a higher burden of proof as one now needs to both establish that the new transcript does indeed exists and then that it expresses differentially.

ADD REPLY • link 6.9 years ago by Istvan Albert 100k