Question

How do you find novel transcripts using GFFcompare?

1

Entering edit mode

6.0 years ago

c_u ▴ 530

Hi, I am trying to find novel transcripts from an RNA-seq database (as mentioned in my previous question). Based on the advice I got, it seemed that using Stringtie for transcript assembly is a good way to go, and it supports novel transcript discovery even with the reference GTF file provided (which apparently significantly improves the assembly process). So I tried to follow the protocol provided in the Nature Protocols paper that described the use of Stringtie; the paper also suggested using GFFcompare for comparing the assembled transcripts with the reference, to quantify and find the novel transcripts.

I followed the entire protocol, and this was the output of GFFcompare when comparing the reference GTF with that of one of the samples after mapping -

#= Summary for dataset: ./hisat2/ERR188044_chrX.gtf 
#-----------------| Sensitivity | Precision  |
        Base level:    51.7     |    79.5    |
        Exon level:    46.7     |    85.2    |
      Intron level:    47.2     |    93.9    |
Intron chain level:    31.2     |    64.4    |
  Transcript level:    31.6     |    52.3    |
       Locus level:    36.6     |    50.1    |

     Matching intron chains:     580
       Matching transcripts:     664
              Matching loci:     397

          Missed exons:    4395/8804    ( 49.9%)
           Novel exons:     426/4874    (  8.7%)
        Missed introns:    3832/7946    ( 48.2%)
         Novel introns:      83/3992    (  2.1%)
           Missed loci:     610/1086    ( 56.2%)
            Novel loci:     273/795 ( 34.3%)

 Total union super-loci across all input datasets: 749 
1270 out of 1270 consensus transcripts written in 188044_only.annotated.gtf (0 discarded as redundant)

In the paper however, the number of novel genes and transcripts is clearly mentioned -

'Sample ID' 'No. of assembled genes'    'Novel genes'   'Transcripts matching annotation'   'Novel transcripts'
ERR188044   808 288 675 615
ERR188104   793 294 651 630

So, I was wondering how they were able to calculate the number of novel transcripts and genes, from the data that GFFcompare provides (they even mention in the paper that they got it from GFFcompare). I see the numbers related to novel exons and novel introns, but how do I translate it to the number and identities of the novel transcripts/genes?

RNA-Seq gffcompare • 4.0k views

ADD COMMENT • link 6.0 years ago by c_u ▴ 530

0

Entering edit mode

Hello chahat_u!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/8992/how-to-find-novel-transcripts-using-gffcompare

This is typically not recommended as it runs the risk that people in both communities spend their time helping you.

ADD REPLY • link 6.0 years ago by WouterDeCoster 48k

0

Entering edit mode

Hi Wouter, thanks for letting me know. I tried to search on that other site if cross-posting is OK, and the sense that I got was that its not necessarily a bad thing, although I do understand your point of people spending double the amount of time. Therefore I waited for 3 days and when there was no response here, I posted there. Let me know if I should remove it from one of the places

ADD REPLY • link 6.0 years ago by c_u ▴ 530