I have looked through similar posts with the same warning:
WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.
The indexes were built using the same -G file so the naming conventions should be exactly the same. An ERCC control has been included in the dataset but the same error occurs when the control sequences are not included.
The reference.gtf looks how it should but I'm concerned perhaps the geneID column (9th)?
scaffold1 WormBase_imported exon 7437 7876 . + . transcript_id "transcript:BN1106_s1B000532.mRNA-1"; gene_id "gene:BN1106_s1B000532"; gene_name "BN1106_s1B000532";
Has anyone else seen something similar?
Could there be a problem with the sort and convert step from sam to bam files? Should I be using -n option and sorting by read name? I'm using the command below which sort by leftmost coordinate by default as that's what the protocols paper used.
samtools sort -@ 8 -o sample.bam sample.sam
Thank you in advance for any help :-)
p.s. I don't think my script has any problem but here's a sample:
stringtie -p 8 -G genome/genome_ERCC92.gtf -o sample.gtf sample.bam