Question: Cufflinks : Analysis Comparison With And Without A Gtf Reference File
9
gravatar for Hélène
7.9 years ago by
Hélène90
Hélène90 wrote:

Hello, I have many questions about cufflinks output. Here one of them : First I've used tophat to map my RNAseq (100pb) to obtain a accepted_hits.bam file. Then I've used cufflinks in two ways :

  1. simply : cufflinks accepted_hits.bam
  2. with a gtf file, that is the actually annotation of my genome (eucalyptus) :

    cufflinks -g annotGenome.gtf accepted_hits.bam

Note that I've used the –g and not the –G option.

One example result :

- without reference gtf :
one gene / one isoform : 12110-17714

The first part of this isoform 12-16530 has the same structure intron/exon than isoforms formed with the reference. Then I have a last exon 16530-17714.

-with reference gtf
one gene/two isoforms
+ transcript 1: 12024-17350 = exact transcript from the reference
full_read_support "no";

The corresponding no reference last exon is now :

16530 - 16561
16597 - 17350

That's my reference, but in my run this intron is mapped. There is no read that split in two parts. A few reads begin at position 16595. I've checked no read ending at 16561. I thing this RNA doesn't exist in my transcriptome.

+ transcript 2 : 12024-17714
full_read_support "yes";
last exon : 16530-17714, the same exon than the no reference version

Why this transcript2 contains the 12024-12109 portion that is not mapped with RNAseq (instead the reference=transcript1 begin with this sequence) ?

for the two isoforms, I have FPKM values (4 for transcript1 that doesn't seem to exist in my transcriptome and 13 for the transcript2). How cufflinks attributes those values ?

With the version without gtf reference, I have a FPKM=36, that is the double comparing with the version with reference (13+4=17) while the mapping file is the same.

At least, note that those transcripts are located on the forward strand of the genome and that there is nothing in gtf and cufflink results on the opposite strand at this location.

Many thanks for your suggestions,

Sohnic

fpkm cufflinks rna • 6.5k views
ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 7.9 years ago by Hélène90

I have a similar problem. When i use the GTF file with cufflinks, it detects a few hundred transcripts. When i dont use a GTF file, cufflinks detects and assembles about 5000 transcripts. I know that we do expect a few thousand transcripts from the experiment. But i dont know why this counter-intuitive behavior of cufflinks.

ADD REPLYlink written 7.2 years ago by Sameet290
1

Here's a thought to consider: you can either guide (-g) or constrain (-G) how cufflinks handles transcripts. In the 'guide' option, it assumes that you will later on (when running cuffmerge) merge you novel transcript with known transcripts. In the 'constrain' option you will exclude a merge step and proceed straight to running cuffdiff. Might that explain this behaviour?

ADD REPLYlink written 5.6 years ago by polarise380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour