Question: How to deal with MSTRG tag without relevant gene name?
gravatar for stcatpang
2.5 years ago by
stcatpang50 wrote:

Hi~ I used the hisat2-stringtie pipeline to deal with RNA-seq data and got a result with MSTRG tags. Some of them had gene name which was convenient to do function annotation after. But 1/3 of my data had rows with MSTRAG tag merely like this:

chr6    StringTie   transcript  72101340    72101890    1000    -   .   gene_id "MSTRG.58117"; transcript_id "MSTRG.58117.1";
chr6    StringTie   exon    72101340    72101890    1000    -   .   gene_id "MSTRG.58117"; transcript_id "MSTRG.58117.1"; exon_number "1";

Is there any suggestions on how to deal with them?

Thanks! Aoi

rna-seq • 6.8k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 2.5 years ago by stcatpang50

MSTRG IDs are default given names by stringtie while merging transcript gtfs. Naming convention of MSRTG is explained here by Stringtie devs. From the manual, you can change the default name of the transcripts while using stringtie merge option. Is reference GTF provided while mergiing?

-l <label>  name prefix for output transcripts (default: MSTRG)

However, these tags (MSTRG) are not useful in comparing across samples.

copy/pasted from Dev suggestion:

"you cannot rely on MSTRG.gene# identifiers but instead I'd suggest converting those gene IDs into locations on the genome (or some common reference annotation gene IDs/symbols, though such will not be available for "novel" genes)."

TL;DR:By default, stringtie appends MSTRG if no name is given.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by cpad011213k

Really appreciate for your reply. I used the GTF file from Ensembl. Transcript listed above had location information but no reference annotation gene IDs. So is it proper to drop them away and keep those with gene symbols for further analysis? Thanks!

ADD REPLYlink written 2.5 years ago by stcatpang50

It depends on end goal of the study. If you are interested only standard transcripts/genes (i.e Ensembl, all or targeted), it is okay to exclude MSTRG transcripts/genes for downstream analysis. But do not throw away those genes/transcripts. Try to analyze these coordinates with care. They might be partial /& novel transcripts/genes or may be available in other databases.

ADD REPLYlink written 2.5 years ago by cpad011213k

Hello everyone, I also have above same problem i.e; ( in my case Cuffdiff gives gene ID but there specific gene names are missing) I used reference.gtf file during every steps. I also try to get specific gene name using there chr. locus number but no result found, did blast also. No any information get from databases, please guide what steps I do to find gene names. I need gene name for further downstream analysis.

ADD REPLYlink written 2.1 years ago by bruseq40

Hi, divya~ I think you can check whether the reference.gtf matches your data. If there were no specific gene names for any sequence, one possible reason is that the reference.gtf and your bowtie index genome were different (hg38 and hg19 for example).

ADD REPLYlink written 2.1 years ago by stcatpang50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1652 users visited in the last hour