Question: cuffcompare: No novel genes found. Is this possible?
gravatar for sangram_keshari
11 months ago by
sangram_keshari130 wrote:

I tried to run cuffcompare using cufflink assembled transcript output and an annotation file. I was expecting to some novel transcripts to be found. But it gave this bellow output (stats)? Can anyone help what may have gone wrong?

# Cuffcompare v2.2.1 | Command line was:
#cuffcompare CLV_2_transcripts.gtf -r ano.gff3

#= Summary for dataset: CLV_2_transcripts.gtf :
#     Query mRNAs :   53667 in   32615 loci  (42061 multi-exon transcripts)
#            (10524 multi-transcript loci, ~1.6 transcripts per locus)
# Reference mRNAs :   54124 in   32619 loci  (42815 multi-exon)
# Super-loci w/ reference transcripts:    32494
#--------------------|   Sn   |  Sp   |  fSn |  fSp  
    Base level:     100.0   100.0     -       - 
    Exon level:     114.2   114.6   100.0   100.0
  Intron level:     100.0   100.0   100.0   100.0
Intron chain level:      98.2   100.0   100.0   100.0
Transcript level:    98.4    99.2    87.9    88.7
   Locus level:     100.0   100.0   100.0   100.0

 Matching intron chains:   42061
          Matching loci:   32611

      Missed exons:       7/193010  (  0.0%)
       Novel exons:       0/192188  (  0.0%)
    Missed introns:       1/132525  (  0.0%)
     Novel introns:       0/132525  (  0.0%)
       Missed loci:       7/32619   (  0.0%)
        Novel loci:       0/32615   (  0.0%)

Total union super-loci across all input datasets: 32612
rna-seq cuffcompare • 443 views
ADD COMMENTlink modified 10 months ago • written 11 months ago by sangram_keshari130

Why were you expecting novel transcripts to be found?; Which species is this?; which GTF guide have you used?

In addition, Cufflinks, CuffCompare, etc are outdated. Unless there is some legacy reason why you need to use these programs, you should instead be using HISAT2 and StringTie,

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe41k

Sorry for incomplete information.

This is Arabidopsis species. I used the latest version of annotation (GTF guide) from Araport.

I have read the documentation of both pipelines (TopHat-Cufflink and HISAT2-Stringtie) from nature protocol. HISAT2 is definitly superior to TopHat in some benchmark reports. But I am using STAR aligner.

I think most of Cufflinks modules are still used by recent publications than stringtie and coming to CuffCompare (For which I asked the query here) is the same program used to built gffcompare which is coming separately to stringtie. Because of some library installing issue for gffcompare, I am still using the CuffCompare. I don't think this will create many problems (I may be wrong).

I am expecting some novel transcripts because the sequencing depth is very much high in case of our samples.

I am sure somewhere mistake has happed, but I couldn't able to figure out.

ADD REPLYlink written 11 months ago by sangram_keshari130

Isn't cuffcompare normally used to compare between 2 or more assembled transcriptome GTFs? You appear to be just comparing a single sample (to itself?). The syntax is:

cuffcompare [options]* <cuff1.gtf> [cuff2.gtf] … [cuffN.gtf]

Take a look at the other available options here:

ADD REPLYlink written 11 months ago by Kevin Blighe41k

No, Actually I am comparing my sample transcriptome (GTF) with Annotation file (GFF3 in my case). This supposed to give novel transcripts (As class code: J in output files) which are not reported before in annotation file.

ADD REPLYlink written 11 months ago by sangram_keshari130

Any novel transcripts would have been listed in your GTF produced by cufflinks itself, at least from my experience. Had you used your annotation file GFF3 as the guide during alignment and assembly, then you would surely have identified novel transcripts. On that note, as you used Star for alignment, the necessary tags that are required by cufflinks / cuffcompare may not have been added to your aligned BAM, i.e., tags related to strand alignment.

I would re-align using TopHat2 / HISAT2, and then go from there.

ADD REPLYlink written 11 months ago by Kevin Blighe41k

Yes, The mistake was in Assembly step. Use of reference annotation file as a guide instead of just used for quantifying known transcripts. Thanks for the help :)

ADD REPLYlink written 10 months ago by sangram_keshari130

No problem bro.

ADD REPLYlink written 10 months ago by Kevin Blighe41k

What did you use for mapping and how well it went?

ADD REPLYlink written 10 months ago by Vijay Lakhujani4.0k

I used STAR for mapping and it was successful with more than 85% reads align to reference genome.

ADD REPLYlink written 10 months ago by sangram_keshari130
gravatar for sangram_keshari
10 months ago by
sangram_keshari130 wrote:

Okay, that was a very novice kind mistake on my side. Now that I figured it out. Just let me share it here, in case if someone faces the same.

It was in the assembly step (Using Cufflink), that I used the option -G/--GTF (which simply quantitating against reference transcript annotations) instead of -g/--GTF-guide (which use reference transcript annotation to guide assembly).

Now I able to find the novel elements in subsequent steps of the Cufflink package.

ADD COMMENTlink written 10 months ago by sangram_keshari130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 810 users visited in the last hour