Question

cuffcompare and cuffmerge

0

Entering edit mode

6.5 years ago

qudrat ▴ 100

When we merge two GTF file from two different sample, Is it true that Cuffcmerge lists common transcripts between two sample and Cuffcompare produce all the transcripts between two sample with no redundancy?

RNA-Seq next-gen • 2.7k views

ADD COMMENT • link updated 3.7 years ago by Biostar 20 • written 6.5 years ago by qudrat ▴ 100

0

Entering edit mode

Thank you very much Kevin Blighe!

ADD REPLY • link 6.5 years ago by qudrat ▴ 100

0

Entering edit mode

But in case of option #3 some transcripts might be missed because I have tried both strategy and in option #2 i.e Cuffcompare have more number of transcripts. In Cuffmerge I think there is a chance of getting false positive.

ADD REPLY • link 6.5 years ago by qudrat ▴ 100

0

Entering edit mode

I see.

There are a lot of parameters to configure - have you looked through each in order to see which ones may be relevant?

You should also be aware that Cufflinks has been 'retired' and that the new pipeline is HISAT and StringTie: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5032908/

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Reading the explanation by Trapnell, it makes sense that you will see more transcripts from Cuffcompare.

In summary: Both Cuffcompare and Cuffmerge will merge transcripts only if they agree on splicing structure. Thereafter, Cuffcompare will only merge transcripts if one is contained within the other; whereas Cuffmerge will merge any transcripts that overlap. In both cases, you should more or less see the same transcripts, though.

The critical part that you need to watch is the identification of transcripts through Cufflinks. Cuffcompare also has many parameters that can be configured, whereas Cuffmerge's ideology is simple and doesn't require much configuration - if transcripts overlap at any level and share splicing structure, then Cuffmerge merges them.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

स्वागत है / You're welcome

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

score 2 · Answer 1 · 2017-10-13

Hi again qudrat,

The answer from the developer, Cole Trapnell, is this:

I can shed some light on this. We have an upcoming protocol paper that describes our recommended workflow for TopHat and Cufflinks that discusses some of these issues.

As turnersd outlined, there are three strategies:

1) merge bams and assemble in a single run of Cufflinks

2) assemble each bam and cuffcompare them to get a combined.gtf

3) assemble each bam and cuffmerge them to get a merged.gtf

All three options work a little differently depending on whether you're also trying to integrate reference transcripts from UCSC or another annotation source.

1 is quite different from #2 and #3, so I'll discuss its pros and cons first. The advantage here is simplicity of workflow. It's one Cufflinks run, so no need to worry about the details of the other programs. As turnersd mentions, you might also think this maximizes the accuracy of the resulting assembly, and that might be the case, but it also might not (for technical reasons that I don't want to get into right now). The disadvantage of this approach is that your computer might not be powerful enough to run it. More data and more isoforms means substantially more memory and running time. I haven't actually tried this on something like the human body map, but I would be very impressed and surprised if Cufflinks can deal with all of that on a machine owned by mere mortals.

2 and 3 are very similar - both are designed to gracefully merge full-length and partial transcript assemblies without ever merging transfrags that disagree on splicing structure. Consider two transfrags, A and B, each with a couple exons. If A and B overlap, and they don't disagree on splicing structure, we can (and according to Cufflinks' assembly philosophy, we should) merge them. The difference between Cuffcompare and Cuffmerge is that Cuffcompare will only merge them if A is "contained" in B, or vice versa. That is, only if one of the transfrags is essentially redundant. Otherwise, they both get included. Cuffmerge on the other hand, will merge them if they overlap, and agree on splicing, and are in the same orientiation. As turnersd noted, this is done by converting the transfrags into SAM alignments and running Cufflinks on them.

The other thing that distinguishes these two options is how they deal with a reference annotation. You can read on our website how the Cufflinks Reference Annotation Based Transcript assembler (RABT) works. Cuffcompare doesn't do any RABT assembly, it just includes the reference annotation in the combined.gtf and discards partial transfrags that are contained and compatible with the reference. Cuffmerge actually runs RABT when you provide a reference, and this happens during the step where transfrags are converted into SAM alignments and assembled. We do this to improve quantification accuracy and reduce errors downstream. I should also say that Cuffmerge runs cuffcompare in order annotate the merged assembly with certain helpful features for use later on.

So we recommend #3 for a number of reasons, because it is the closest in spirit to #1 while still being reasonably fast. For reasons that I don't want to get into here (pretty arcane details about the Cufflinks assembler) I also feel that option #3 is actually the most accurate in most experimental settings.

source: http://seqanswers.com/forums/showthread.php?t=16422

You'll note that his own recommendation is to use cuffmerge.

Kevin