Question: How do you find novel transcripts using GFFcompare?
Hi, I am trying to find novel transcripts from an RNA-seq database (as mentioned in my previous question). Based on the advice I got, it seemed that using Stringtie for transcript assembly is a good way to go, and it supports novel transcript discovery even with the reference GTF file provided (which apparently significantly improves the assembly process). So I tried to follow the protocol provided in the Nature Protocols paper that described the use of Stringtie; the paper also suggested using GFFcompare for comparing the assembled transcripts with the reference, to quantify and find the novel transcripts.

I followed the entire protocol, and this was the output of GFFcompare when comparing the reference GTF with that of one of the samples after mapping -

#= Summary for dataset: ./hisat2/ERR188044_chrX.gtf 
#-----------------| Sensitivity | Precision  |
        Base level:    51.7     |    79.5    |
        Exon level:    46.7     |    85.2    |
      Intron level:    47.2     |    93.9    |
Intron chain level:    31.2     |    64.4    |
  Transcript level:    31.6     |    52.3    |
       Locus level:    36.6     |    50.1    |

     Matching intron chains:     580
       Matching transcripts:     664
              Matching loci:     397

          Missed exons:    4395/8804    ( 49.9%)
           Novel exons:     426/4874    (  8.7%)
        Missed introns:    3832/7946    ( 48.2%)
         Novel introns:      83/3992    (  2.1%)
           Missed loci:     610/1086    ( 56.2%)
            Novel loci:     273/795 ( 34.3%)

 Total union super-loci across all input datasets: 749 
1270 out of 1270 consensus transcripts written in 188044_only.annotated.gtf (0 discarded as redundant)

In the paper however, the number of novel genes and transcripts is clearly mentioned -

'Sample ID' 'No. of assembled genes'    'Novel genes'   'Transcripts matching annotation'   'Novel transcripts'
ERR188044   808 288 675 615
ERR188104   793 294 651 630

So, I was wondering how they were able to calculate the number of novel transcripts and genes, from the data that GFFcompare provides (they even mention in the paper that they got it from GFFcompare). I see the numbers related to novel exons and novel introns, but how do I translate it to the number and identities of the novel transcripts/genes?

