Question: How do you find novel transcripts using GFFcompare?
gravatar for c_u
14 months ago by
United States
c_u260 wrote:

Hi, I am trying to find novel transcripts from an RNA-seq database (as mentioned in my previous question). Based on the advice I got, it seemed that using Stringtie for transcript assembly is a good way to go, and it supports novel transcript discovery even with the reference GTF file provided (which apparently significantly improves the assembly process). So I tried to follow the protocol provided in the Nature Protocols paper that described the use of Stringtie; the paper also suggested using GFFcompare for comparing the assembled transcripts with the reference, to quantify and find the novel transcripts.

I followed the entire protocol, and this was the output of GFFcompare when comparing the reference GTF with that of one of the samples after mapping -

#= Summary for dataset: ./hisat2/ERR188044_chrX.gtf 
#-----------------| Sensitivity | Precision  |
        Base level:    51.7     |    79.5    |
        Exon level:    46.7     |    85.2    |
      Intron level:    47.2     |    93.9    |
Intron chain level:    31.2     |    64.4    |
  Transcript level:    31.6     |    52.3    |
       Locus level:    36.6     |    50.1    |

     Matching intron chains:     580
       Matching transcripts:     664
              Matching loci:     397

          Missed exons:    4395/8804    ( 49.9%)
           Novel exons:     426/4874    (  8.7%)
        Missed introns:    3832/7946    ( 48.2%)
         Novel introns:      83/3992    (  2.1%)
           Missed loci:     610/1086    ( 56.2%)
            Novel loci:     273/795 ( 34.3%)

 Total union super-loci across all input datasets: 749 
1270 out of 1270 consensus transcripts written in 188044_only.annotated.gtf (0 discarded as redundant)

In the paper however, the number of novel genes and transcripts is clearly mentioned -

'Sample ID' 'No. of assembled genes'    'Novel genes'   'Transcripts matching annotation'   'Novel transcripts'
ERR188044   808 288 675 615
ERR188104   793 294 651 630

So, I was wondering how they were able to calculate the number of novel transcripts and genes, from the data that GFFcompare provides (they even mention in the paper that they got it from GFFcompare). I see the numbers related to novel exons and novel introns, but how do I translate it to the number and identities of the novel transcripts/genes?

gffcompare rna-seq • 872 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by c_u260

Hello chahat_u!

It appears that your post has been cross-posted to another site:

This is typically not recommended as it runs the risk that people in both communities spend their time helping you.

ADD REPLYlink modified 14 months ago • written 14 months ago by WouterDeCoster44k

Hi Wouter, thanks for letting me know. I tried to search on that other site if cross-posting is OK, and the sense that I got was that its not necessarily a bad thing, although I do understand your point of people spending double the amount of time. Therefore I waited for 3 days and when there was no response here, I posted there. Let me know if I should remove it from one of the places

ADD REPLYlink written 14 months ago by c_u260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1713 users visited in the last hour