Question

Stringtie2 no reference transcript found

0

Entering edit mode

4.2 years ago

sonsunjirachote • 0

First of all, I know that I have to ask this on github but the contributor of stringtie2 not open for asking the question about the issue. Here is the problem I found. I want the gff output for investigate annotation, coverage and abundance. However, the its give me an error. Like this

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.

My input is nanopore long read prokaryote's whole transcriptome as sorted BAM file which mapping by using minimap2 with the cDNA (cds) FASTA file as reference. I have use the reference both ensembl and NCBI but they all gave me an error. I also tried with galaxy and it still give me the same result....

What step that I have missed?

Here my command use

minimap2 -a -x map-ont [myref.fa] [my.fastq]| samtools view -b - -o [my.bam]

samtools sort -T tmp -o [my.sorted.bam] [my.bam] && samtools index [my.sorted.bam]

stringtie -L [my.sortedbam] -G [ref.gtf] -o [my.gtf]

Any suggestion? maybe about the parameters or options that I have to optimize for the nanopore data. Thanks in advance

alignment rna-seq nanopore stringtie samtools • 2.0k views

ADD COMMENT • link updated 4.2 years ago by GenoMax 152k • written 4.2 years ago by sonsunjirachote • 0

1

Entering edit mode

My input is nanopore long read prokaryote's whole transcriptome as sorted BAM file which mapping by using minimap2 with the cDNA (cds) FASTA file as reference.

Why are you going to all this trouble when you have the simplest possible case of RNAseq?

You have a prokaryotic genome which should have no splicing. You are using long reads so even for the longest genes your reads should already cover the entire gene. Can you tell us how these libraries were made? Was there any fragmentation done after conversion of RNA to cDNA or were you sequencing RNA directly (which is possible with nanopore)? Are you seeing polycistronic reads (where reads cover multiple genes) in your data (they will show multi-mapping/seconday alignments if you aligned to just CDS fasta)?

ADD REPLY • link 4.2 years ago by GenoMax 152k

0

Entering edit mode

Actually, this is not my dataset. I downloaded them from this article https://www.biorxiv.org/content/10.1101/2019.12.18.880849v2. They use direct RNA sequencing methods.
Yes, some of reads are cover multiple gene.

ADD REPLY • link 4.2 years ago by sonsunjirachote • 0

0

Entering edit mode

Are you simply trying to replicate the analysis they report? I looked at the manuscript briefly and the data analysis is described in sufficient details to allow replication.

ADD REPLY • link 4.2 years ago by GenoMax 152k