Hello,
I am using cuffcompare to identify novel transcripts. However, I am suspicious that many of these "novel" transcripts may be junk/noise, because I was expecting something like 5-10 novel transcripts, but am getting thousands. Many transcripts even show up without chromosomes in the combined GTF!
my output:
# Cuffcompare v2.2.1 | Command line was:
#cuffcompare -s /illumina/runs/RNASeq/Gencode27/GRCh38.p10.genome.fa -r /illumina/runs/RNASeq/Gencode27/gencode.v27.annotation.gtf tra
nscripts.gtf
#
#= Summary for dataset: transcripts.gtf :
# Query mRNAs : 212216 in 66959 loci (173199 multi-exon transcripts)
# (20867 multi-transcript loci, ~3.2 transcripts per locus)
# Reference mRNAs : 198869 in 54870 loci (174194 multi-exon)
# Super-loci w/ reference transcripts: 47911
#--------------------| Sn | Sp | fSn | fSp
Base level: 99.7 92.1 - -
Exon level: 150.1 146.9 100.0 100.0
Intron level: 99.4 98.7 100.0 100.0
Intron chain level: 96.0 96.6 100.0 100.0
Transcript level: 96.1 90.0 87.7 82.2
Locus level: 100.0 81.8 100.0 81.8
Matching intron chains: 167245
Matching loci: 54851
Missed exons: 12/573839 ( 0.0%)
Novel exons: 13301/586450 ( 2.3%)
Missed introns: 1783/352804 ( 0.5%)
Novel introns: 155/355547 ( 0.0%)
Missed loci: 6/54870 ( 0.0%)
Novel loci: 12140/66959 ( 18.1%)
Total union super-loci across all input datasets: 66949
I have tried to run cuffcompare with output on a public data set, but for some reason cuffcompare isn't reporting this information with the public data set:
# Cuffcompare v2.2.1 | Command line was:
#cuffcompare -s /illumina/runs/RNASeq/Gencode27/GRCh38.p10.genome.fa -r /illumina/runs/RNASeq/Gencode27/gencode.v27.annotation.gtf SRR5335744/transcripts.gtf SRR5335745/transcripts.gtf SRR5335746/transcripts.gtf SRR5335747/transcripts.gtf SRR5335748/transcripts.gtf SRR533
5749/transcripts.gtf SRR5335750/transcripts.gtf SRR5335751/transcripts.gtf SRR5335752/transcripts.gtf SRR5335753/transcripts.gtf SRR5335754/transcripts.gtf SRR5335755/transcripts.gtf SRR5335756/transcripts.gtf SRR5335757/transcripts.gtf SRR5335758/transcripts.gtf SRR53357
59/transcripts.gtf SRR5335760/transcripts.gtf SRR5335761/transcripts.gtf SRR5335762/transcripts.gtf SRR5335763/transcripts.gtf SRR5335764/transcripts.gtf SRR5335765/transcripts.gtf SRR5335766/transcripts.gtf SRR5335767/transcripts.gtf SRR5335768/transcripts.gtf SRR5335769
/transcripts.gtf
#
Total union super-loci across all input datasets: 72350
(23997 multi-transcript, ~5.0 transcripts per locus)
are these results typical for cuffcompare in RNA-Seq?