Question: Rna-Seq With Cuffdiff: Use Merged.Gtf From Cuffmerge Or Combined.Gtf From Cuffcompare?
gravatar for Stephen
8.8 years ago by
Charlottesville Virginia
Stephen2.7k wrote:

Dear community,

A similar question was asked previously on SeqAnswers and no answers were posted. I'll expand that question.

I'm trying to better understand the cufflinks --> cuffdiff workflow. Once I run cufflinks on each of my .bam files (from tophat), I have a separate .gtf assembly for each sample. To run cuffdiff I need a single unified .gtf file of my assembled transcripts.

If I want to run a differential expression analysis with cuffdiff, should I use the merged.gtf file produced by cuffmerge or the combined.gtf file produced by cuffcompare? How are these two files different, and what would be the downstream effect of using one or the other for differential expression in cuffdiff?

EDIT: Or would a better workflow be to forego cuffmerge/cuffcompare altogether in favor of running cufflinks on a merge of all the .bam files to generate a single assembly that maximizes assembly accuracy, and use this as the "reference" for cuffdiff? (E.g. samtools merge)


More info:

From the cuffcompare documentation:

Cuffcompare clusters/tracks transfrags across samples, and writes a GTF file <outprefix>.combined.gtf containing a nonredundant set of transcripts across all input files (with a single representative transfrag chosen for each clique of matching transfrags across samples).

From the cuffmerge documentation:

cuffmerge takes two or more Cufflinks GTF files and merges them into a single unified transcript catalog. Optionally, you can provide the script with a reference GTF, and the script will use it to attach gene names and other metadata to the merged catalog.

gtf cuffdiff cufflinks rna • 13k views
ADD COMMENTlink modified 5.4 years ago by pengchy430 • written 8.8 years ago by Stephen2.7k
gravatar for Damian Kao
8.8 years ago by
Damian Kao15k
Damian Kao15k wrote:

If you want differential expression, you should just use the combined.gtf from cuffcompare.

As I understand it, cuffcompare tries to match transcripts across the samples to each other in order to do the differential expression analysis. I am not sure what the algorithm they use to determine whether two transcripts are comparable.

Cuffmerge will actually convert your .gtf assembles into .sam and run cufflinks on the .sam file to produce a merged assembly.

ADD COMMENTlink modified 8.8 years ago • written 8.8 years ago by Damian Kao15k
gravatar for roxane.legaie
8.3 years ago by
roxane.legaie30 wrote:

Merging assemblies with cuffmerge

Cufflinks includes a script called cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. If you have a reference GTF file available, you can provide it to the script in order to gracefully merge novel isoforms and known isoforms and maximize overall assembly quality. The main purpose of this script is to make it easier to make an assembly GTF file suitable for use with Cuffdiff.

ADD COMMENTlink written 8.3 years ago by roxane.legaie30

what if one is not looking for novel isoforms, is it essential to merge all the gtfs with reference gtf when we are just interested in known isoforms expression level?

ADD REPLYlink written 4.1 years ago by fta.mirzadeh0
gravatar for pengchy
5.4 years ago by
pengchy430 wrote:

From another way to understand the difference, cuffmerge will produce new transcripts that don't present in any input gtf files, while cuffcompare will select one as representative when several transcripts overlapped consistently. 

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by pengchy430
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 972 users visited in the last hour