Question: Can I ignore these MSTRG genes in downstream analysis (
gravatar for Fawzi Yassine
22 months ago by
Fawzi Yassine20 wrote:


I am using RNAseq analysis to find genes differentially expressed between 2 conditions. I am using StringTie for transcript assembly and quantification. I am using in order to use StringTie with DESeq2 as instructed on which outputs gene_count_matrix.csv? This file has Gene IDs. Some of them had gene like NM_000144 which was convenient to do downstream analysis after. But others of my data had rows with MSTRAG tag. Can I ignore these MSTRG genes in downstream analysis (Enrichment Analysis at pantherdb.oorg)? If not, how can I get the corresponding gene symbols? regards,

rna-seq deseq2 stringtie • 812 views
ADD COMMENTlink modified 3 months ago by kristoffer.vittingseerup3.5k • written 22 months ago by Fawzi Yassine20

Check this out How to deal with MSTRG tag without relevant gene name?

ADD REPLYlink written 22 months ago by lakhujanivijay5.3k

I did not understand this reply from the link you provided. "If you are interested only standard transcripts/genes (i.e Ensembl, all or targeted), it is okay to exclude MSTRG transcripts/genes for downstream analysis. But do not throw away those genes/transcripts. "

ADD REPLYlink written 22 months ago by Fawzi Yassine20

If you work with human or mouse (probably the most well-annotated organisms when it comes to genomics) why do you use stringtie at all? There are comprehensive annotations from GENCODE/Ensembl or RefSeq that you can quantify against. Transcript assembly is probably only beneficial if you look for new transcripts but not in standard analysis. Also keep in mind that transcript assembly probably requires quiet some sequencing depth and read length, so why the effort for standard DE analysis? I would simply quantify with salmon against Gencode transcriptome and then proceed with tximport and DESeq2. You would probably need to verify new transcripts from stringtie anyway to show that they are reliable and not artifacts, so save yourself the trouble.

ADD REPLYlink written 22 months ago by ATpoint44k

ATpoint I have always liked your replys But not this one. I have already done the assembly using stringtie (on AWS). Moreover I promised my would be employer to use stringtie I am only getting 167 proper gene id’s out of the 4077 significantly different genes. The rest have MSTRG tags in their id’s.

ADD REPLYlink written 22 months ago by Fawzi Yassine20

Well, you don't have to like a reply, of course, but then why do you ask for help? :)

ADD REPLYlink written 22 months ago by WouterDeCoster45k

ATpoint is a professional person so he wilil rightly think that I am complementing him in that reply, especially that I asked him another question.

ADD REPLYlink written 22 months ago by Fawzi Yassine20
gravatar for kristoffer.vittingseerup
3 months ago by
European Union
kristoffer.vittingseerup3.5k wrote:

StringTie annotation can have 2 problems: 1) Unassigned gene_name in single gene: It is a novel transcript in a known gene 2) Cluster of genes (multiple gene_names/gene_ids) which are joined together by StringTie because of their overlap in genomic space. Lastly you can find novel genes which will also have no corresponding annoation.

From my experience with StringTie data there are typically thens of thousands of missing gene_names and ~50% of the missing gene_names are due to problem 1 and 2. To solve this I have just release an update to the R package IsoformSwitchAnalyzeR (available in >1.11.6) which can fix problem 1 and 2 for most genes. You simply use the importRdata() function - which will fix the isoform annotation which is fixable and clean up the rest of the annotation. From the resulting switchAnalyzeRList object you can analyse isoform switches with predicted functional consequences with IsoformSwitchAnalyzeR or use extractGeneExpression() to get a gene count matrix for DE analysis with other tools.

Hope this helps.



ADD COMMENTlink written 3 months ago by kristoffer.vittingseerup3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2442 users visited in the last hour