I have merged transcriptome assembly of 728 accessions and compare it with Arabidopsis refernce annotation file (ftp://ftp.ensemblgenomes.org/pub/release-43/plants/gtf/arabidopsis_thaliana) using gffcompare. Its stat' output is:
gffcompare -r /media/waqas/second/Analysis/TAIR10_genome/ensemble_gtf/Arabidopsis_thaliana.TAIR10.42.gtf -o gffcompare stringtie_merged.gtf
Summary for dataset: stringtie_merged.gtf
Query mRNAs : 126199 in 32235 loci (112943 multi-exon transcripts)
(17283 multi-transcript loci, ~3.9 transcripts per locus)
Reference mRNAs : 53516 in 32046 loci (42815 multi-exon)
Super-loci w/ reference transcripts: 27093
| Sensitivity | Precision | Base level: 100.0 | 86.5 | Exon level: 100.0 | 66.0 | Intron level: 100.0 | 70.0 | Intron chain level: 99.6 | 37.8 | Transcript level: 99.6 | 42.2 | Locus level: 100.0 | 89.3 | Matching intron chains: 42644 Matching transcripts: 53316 Matching loci: 32046 Missed exons: 0/192402 ( 0.0%) Novel exons: 27127/342721 ( 7.9%) Missed introns: 1/132525 ( 0.0%) Novel introns: 22825/189217 ( 12.1%) Missed loci: 0/32046 ( 0.0%) Novel loci: 3349/32235 ( 10.4%) Total union super-loci across all input datasets: 32235 126199 out of 126199 consensus transcripts written in gffcompare.annotated.gtf (0 discarded as redundant)
Besides that It generated five more files, I have checked the literature and found that class code 'j' reflects novel isoforms but the occurrence of j varies in gffcompare output files:
gffcompare.annotated.gtf (count of j: 1000)
gffcompare.stringtie_merged.gtf.tmap (count of j: 2807)
gffcompare.tracking (count of j: 2876)
How can I get the idea of actual number of novel isoforms?
Hi waqaskhokhar999 , I changed the topic tag of your post to 'question' , the 'tool' tag is reserved for advertising new tools and such.