Rna-Seq With Cuffdiff: Use Merged.Gtf From Cuffmerge Or Combined.Gtf From Cuffcompare?
Entering edit mode
11.5 years ago
Stephen 2.8k

Dear community,

A similar question was asked previously on SeqAnswers and no answers were posted. I'll expand that question.

I'm trying to better understand the cufflinks --> cuffdiff workflow. Once I run cufflinks on each of my .bam files (from tophat), I have a separate .gtf assembly for each sample. To run cuffdiff I need a single unified .gtf file of my assembled transcripts.

If I want to run a differential expression analysis with cuffdiff, should I use the merged.gtf file produced by cuffmerge or the combined.gtf file produced by cuffcompare? How are these two files different, and what would be the downstream effect of using one or the other for differential expression in cuffdiff?

EDIT: Or would a better workflow be to forego cuffmerge/cuffcompare altogether in favor of running cufflinks on a merge of all the .bam files to generate a single assembly that maximizes assembly accuracy, and use this as the "reference" for cuffdiff? (E.g. samtools merge)


More info:

From the cuffcompare documentation:

Cuffcompare clusters/tracks transfrags across samples, and writes a GTF file <outprefix>.combined.gtf containing a nonredundant set of transcripts across all input files (with a single representative transfrag chosen for each clique of matching transfrags across samples).

From the cuffmerge documentation:

cuffmerge takes two or more Cufflinks GTF files and merges them into a single unified transcript catalog. Optionally, you can provide the script with a reference GTF, and the script will use it to attach gene names and other metadata to the merged catalog.

cufflinks cuffdiff rna gtf • 14k views
Entering edit mode
11.5 years ago

If you want differential expression, you should just use the combined.gtf from cuffcompare.

As I understand it, cuffcompare tries to match transcripts across the samples to each other in order to do the differential expression analysis. I am not sure what the algorithm they use to determine whether two transcripts are comparable.

Cuffmerge will actually convert your .gtf assembles into .sam and run cufflinks on the .sam file to produce a merged assembly.

Entering edit mode
11.0 years ago

Merging assemblies with cuffmerge

Cufflinks includes a script called cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. If you have a reference GTF file available, you can provide it to the script in order to gracefully merge novel isoforms and known isoforms and maximize overall assembly quality. The main purpose of this script is to make it easier to make an assembly GTF file suitable for use with Cuffdiff.

Entering edit mode

what if one is not looking for novel isoforms, is it essential to merge all the gtfs with reference gtf when we are just interested in known isoforms expression level?

Entering edit mode
8.1 years ago
pengchy ▴ 450

From another way to understand the difference, cuffmerge will produce new transcripts that don't present in any input gtf files, while cuffcompare will select one as representative when several transcripts overlapped consistently.


Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6