Question: gene IDs in stringtie output
gravatar for blooming.daisy333
2.6 years ago by
blooming.daisy33390 wrote:

Dear All,

Im using stringTie to assemble the transcripts using my genome annotation file with -G flag. but stringTie assigns its own IDs like MSTRG .1, MSTRG.2 to genes and MSTRG1.1 and MSTRG 2,1 to transcripts despite of using geneom annotation file and im unable to get same gene IDs as to that in genome annotation file. I need those IDs for subsequent functional analysis. Can anyone suggest me how to get the same IDs in stringtie output as to that in genome annotation file????

thanks in anticipation

rna-seq • 2.1k views
ADD COMMENTlink modified 4 months ago by kristoffer.vittingseerup3.5k • written 2.6 years ago by blooming.daisy33390

Hello blooming.daisy333,

Don't forget to follow up on your threads. Please give some feedback to the answers/comments on your last questions:

fin swimmer

ADD REPLYlink written 2.6 years ago by finswimmer14k

Dear finswimmer, I really appreciate your kind and quick help and im extremely sorry for the delay but im still working on those questions. actually these are interconnected for my analysis. I wiill surely give you the comments like have given before. please give me some time. further for some posts that solved my problem, i could not see any upvote/accepted sign to click on. thats why they are not marked.

ADD REPLYlink written 2.6 years ago by blooming.daisy33390

one way is to intersect each mstrg coordinates with known transcriptome gtf @ blooming.daisy333

ADD REPLYlink written 2.5 years ago by cpad011214k

Hello, I am having the same issue as I am getting MSTG ID instead of gene name. Were you able to solve this issue? If yes, please help me and let me know how did you do it?

Many thanks

ADD REPLYlink written 2.4 years ago by arshad129240

Please don't ask question in the space reserved for answers, use the ADD COMMENT button instead.

ADD REPLYlink written 2.4 years ago by h.mon32k

Sorry about that. I am new and didn't realize this.

ADD REPLYlink written 2.3 years ago by arshad129240
gravatar for kristoffer.vittingseerup
4 months ago by
European Union
kristoffer.vittingseerup3.5k wrote:

The missing gene_names from StringTie can originate from 3 different sources: 1) It is a novel transcript in a known gene 2) It is a novel transcript in a cluster of genes (multiple gene_names) which are joined together by StringTie/Cufflinks because of their overlap 3) It is a novel gene - meaning no genomic overlap with any feature in the reference you are using.

From my experience with StringTie data there are typically thens of thousands of missing gene_names and ~50% of the missing gene_names are due to problem 1 and 2. To solve this I have just release an update to the R package IsoformSwitchAnalyzeR (available in >1.11.6) which can fix problem 1 and 2 for most genes. You simply use the importRdata() function - which will fix the isoform annotation which is fixable and clean up the rest of the annotation. From the resulting switchAnalyzeRList object you can analyse isoform switches with predicted functional consequences with IsoformSwitchAnalyzeR or use extractGeneExpression() to get a gene count matrix for DE analysis with other tools.

Hope this helps.



ADD COMMENTlink written 4 months ago by kristoffer.vittingseerup3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1296 users visited in the last hour