Question: Issue with StringTie to get gene_count matrix for DeSEQ
gravatar for pixie@bioinfo
8 months ago by
pixie@bioinfo1.3k wrote:


I had posted this issue previously in a number of forums, but could not find a way out. I am interested in a gene_level analysis (not interested in novel genes/isoforms)

I have ran the StringTie tool thrice with the following commands:

stringtie -p 4 -G transcripts_exon_for_analysis.gtf  -o test_out.gtf accepted_hits.bam

stringtie --merge -p 4 -G transcripts_exon_for_analysis.gtf -o rice_merged.gtf mergelist.txt

stringtie -e -B -p 4 -G rice_merged.gtf -o ballgown/root_rep1/root_rep1.gtf root1_rep1.bam

After this I have used to obtain the gene_count matrix which is an input for DeSeq. Most of my IDs in the matrix are StringTie IDs, Majority of the genes in the annotation file are not picked up. Is there a way I can map the IDs back to my annotation file ?

Also, can I use featureCounts R package to get a gene count matrix incase StringTie doesnt work? I am not interested in novel genes/isoforms

rna-seq stringtie • 542 views
ADD COMMENTlink modified 8 months ago by toralmanvar510 • written 8 months ago by pixie@bioinfo1.3k
gravatar for toralmanvar
8 months ago by
toralmanvar510 wrote:

Gtf file which you get from stringtie have information for both known and novel transcripts. If 1st column contains Stringtie ID instead of Transcript ID, then that means it represents the novel transcript or isoform which is not of your interest. So after getting gene_count matrix from, you can remove those.

However stringtie gtf entry with Transcript ID (representing known transcripts) gives you Gene Symbol which you can be used to correlate them with genes in the annotation file of genome using simple script or shell command

ADD COMMENTlink written 8 months ago by toralmanvar510
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 836 users visited in the last hour