Hello Everyone I'm working on an RNA seq data set obtained from a bacteria, I have followed a pipeline described for HISAT2 using Stringtie and Ballgown. My problem lie in the fact that how to convert the transcript ids generated by Stringtie (STRG0001 or MSTRG000) to actual gene names. The subsequent Differential expression analysis also reports the results with MSTRG or STRG as gene names. I tried to pares the gff file and match these transcripts to gene IDs but i have observed that every gff file is different and same script doesn't works on other files. I would really be thankful if you guys can help me out in this regard, because with the proper gene names my analysis is incomplete. Is there a way to map gene name or symbols to the transcripts. I fell I'm missing some step or is there a method to be followed. I would greatly appreciate the help.
If I understand correctly, you don't see
GeneID:40027734 (which corresponds to the gene CJI97_002588 in the output of StringTie. RefSeq GFF3 files include several attributes in column 9, not all of which are copied over by StringTie. Using the RefSeq GFF3 as a starting point you can build a mapping table with RefSeq transcript accession, RefSeq protein accession and GeneID of the format:
40027734 XM_029034609.1 XP_028889851.1 CJI97_002588
Then, in a post-processing step, add relevant identifiers to the column 9 of StringTie output. Is this what you are looking for?