Hi everyone,
I used StringTie to perform a transcript assembly and differential expression analysis on my bacterial RNA-seq data.
I merged my transcript assemblies and included them in the analysis using the stringtie "prepDE.py3" script (Python3). This generated two output files: the gene_count_matrix and the transcript_count_matrix, which contain the count data for the genes and transcripts, respectively. However, despite including a GTF annotation file for my reference genome, the majority of the genes were labeled only as "MSTRG" and did not have gene names.
- This issue has been raised before (links below), however, I cannot ignore these genes nor find a good alternative way to solve it
How to deal with MSTRG tag without relevant gene name?
How to avoid MSTRG from StringTie
Gene names in Ballgown differential expression analysis
- I used hista2 > stringtie > prepDE, so do you think there is a problem in the alignment steps, and can I reuse these files in further steps? I used a different annotation file for my reference genome GFF instead of GTF, it improved the number of known genes, but MSTRG flag genes number is still high.
Thanks in advance for your help.