Question

'gene_name' is missing in StringTie output file 't_data.ctab'

2

Entering edit mode

5.8 years ago

Mithil Gaikwad ▴ 50

Hello, I used New Tuxedo protocol in which StringTie is used for quantification step using:

stringtie -e -B -p 8 -G merged_gtf -o SRRXXX.gtf SRRXXX.bam

gives the output files as following:

e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  SRRXXX.gtf  t_data.ctab

t_data.ctab columns are used for making countdata for DESeq2 using command

I tried to import t_data.ctab for DESeq2 with the help of tximport manual

   tx2gene <- tmp[, c("t_name", "gene_name")]

but my t_data.ctab contains '.' in 'gene_name' column, which is inappropriate for creation of countdata. Therefore I can't proceed my differential expression of genes. My question is Can I use 'gene_id' column instead of 'gene_name' from t_data.ctab. Or am I supposed to directly switch the quantification tool itself, if yes then which tool will be better as compared to StringTie?

RNA-Seq StringTie New Tuxedo protocol DESeq2 • 2.4k views

ADD COMMENT • link written 5.8 years ago by Mithil Gaikwad ▴ 50

score 0 · Answer 1 · 2018-11-24

0

Entering edit mode

5.8 years ago

Viswanathan • 0

To extract read count information, you can use the script provided by the StringTie authors.

ADD COMMENT • link 5.8 years ago by Viswanathan • 0

score 0 · Answer 2 · 2020-06-09

The problem may come from the fact that you are using non-human data. I am working with Arabidopsis and I also noticed the empty field for gene_name column in t_data.ctab file from StringTie. I thought it was a bug but if you look at this Ballgown documentation, you can see the description gene_name: HUGO gene name for the transcript, if known. HUGO annotation is restricted to human gene nomenclature. Hopefully, it will save time for someone else.