Question: How to deal with MSTRG tag without relevant gene name?
gravatar for stcatpang
3 months ago by
stcatpang10 wrote:

Hi~ I used the hisat2-stringtie pipeline to deal with RNA-seq data and got a result with MSTRG tags. Some of them had gene name which was convenient to do function annotation after. But 1/3 of my data had rows with MSTRAG tag merely like this:

chr6    StringTie   transcript  72101340    72101890    1000    -   .   gene_id "MSTRG.58117"; transcript_id "MSTRG.58117.1";
chr6    StringTie   exon    72101340    72101890    1000    -   .   gene_id "MSTRG.58117"; transcript_id "MSTRG.58117.1"; exon_number "1";

Is there any suggestions on how to deal with them?

Thanks! Aoi

rna-seq • 503 views
ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 3 months ago by stcatpang10

MSTRG IDs are default given names by stringtie while merging transcript gtfs. Naming convention of MSRTG is explained here by Stringtie devs. From the manual, you can change the default name of the transcripts while using stringtie merge option. Is reference GTF provided while mergiing?

-l <label>  name prefix for output transcripts (default: MSTRG)

However, these tags (MSTRG) are not useful in comparing across samples.

copy/pasted from Dev suggestion:

"you cannot rely on MSTRG.gene# identifiers but instead I'd suggest converting those gene IDs into locations on the genome (or some common reference annotation gene IDs/symbols, though such will not be available for "novel" genes)."

TL;DR:By default, stringtie appends MSTRG if no name is given.

ADD REPLYlink modified 3 months ago • written 3 months ago by cpad01124.1k

Really appreciate for your reply. I used the GTF file from Ensembl. Transcript listed above had location information but no reference annotation gene IDs. So is it proper to drop them away and keep those with gene symbols for further analysis? Thanks!

ADD REPLYlink written 3 months ago by stcatpang10

It depends on end goal of the study. If you are interested only standard transcripts/genes (i.e Ensembl, all or targeted), it is okay to exclude MSTRG transcripts/genes for downstream analysis. But do not throw away those genes/transcripts. Try to analyze these coordinates with care. They might be partial /& novel transcripts/genes or may be available in other databases.

ADD REPLYlink written 3 months ago by cpad01124.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 589 users visited in the last hour