I carried out a transcript assembly on my RNA-seq data using StringTie. The transcripts were mapped to a bacterial genome reference using bowtie2, and merged using the "prepDE.py3" script in Python3. This generated two output files: the gene_count_matrix and the transcript_count_matrix, which contain count data for genes and transcripts, respectively.
However, I encountered two issues:
- All gene IDs in the gene_count_matrix are labeled as "MSTRG". I am unsure if this labeling will affect the downstream analysis, as this issue has been raised before.
- All transcript IDs in the transcript_count_matrix are labeled as "GOHBADNI". This ID name appears to be customized, however, I analyzed my own data using my mac terminal and linux, and so I'm not sure where this ID name came from.
The IDs in the gene_count_matrix and transcript_count_matrix files don't match and unsure whether this could affect the downstream differential gene expression analysis (Ballgown, Deseq2, edgR)
I would greatly appreciate any help in resolving these issues.