I am wondering if anyone could provide a tip or any help with generating the necessary transcript to gene map file necessary for using salmon to align RNAseq data against a reference transcriptome?
I would like to do this with the QUT nicotiana benthamiana reference transcriptome. However, the way in which the GFF3 file for the annotation is constructed makes this not possible using the BUSparse package, and there is no gtf file where "transcript_id" and "gene_id" are helpfully specified.
in the attributes column of the gff file, it's not obvious to me which tag denotes transcript, and which is gene. But i'm guessing that (for my purposes at least) "Nbv5tr6198039.mrna1" for example may be considered transcript id, while "Nbv5tr6198039" may be considered gene id. Please see below some example lines from the GFF3 file.
Nbv0.5scaffold4004 Nbdbv05 gene 109116 109315 . - . ID=Nbv5tr6198039.path1;Name=not determined by homology or low homology during annotation Nbv0.5scaffold4004 Nbdbv05 mRNA 109116 109315 . - . ID=Nbv5tr6198039.mrna1;Name=Nbv5tr6198039;Parent=Nbv5tr6198039.path1;coverage=100.0;identity=100.0 Nbv0.5scaffold4004 Nbdbv05 CDS 109168 109314 100 - 0 ID=Nbv5tr6198039.mrna1.cds1;Name=Nbv5tr6198039;Parent=Nbv5tr6198039.mrna1;Target=Nbv5tr6198039 2 148 +
Thanks in advance for any help.