'gene_name' is missing in StringTie output file 't_data.ctab'
2
2
Entering edit mode
3.6 years ago

Hello, I used New Tuxedo protocol in which StringTie is used for quantification step using:

stringtie -e -B -p 8 -G merged_gtf -o SRRXXX.gtf SRRXXX.bam


gives the output files as following:

e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  SRRXXX.gtf  t_data.ctab


t_data.ctab columns are used for making countdata for DESeq2 using command

I tried to import t_data.ctab for DESeq2 with the help of tximport manual

   tx2gene <- tmp[, c("t_name", "gene_name")]


but my t_data.ctab contains '.' in 'gene_name' column, which is inappropriate for creation of countdata. Therefore I can't proceed my differential expression of genes. My question is Can I use 'gene_id' column instead of 'gene_name' from t_data.ctab. Or am I supposed to directly switch the quantification tool itself, if yes then which tool will be better as compared to StringTie?

RNA-Seq StringTie New Tuxedo protocol DESeq2 • 1.7k views
0
Entering edit mode
3.6 years ago

To extract read count information, you can use the script provided by the StringTie authors.

0
Entering edit mode
2.0 years ago
Johan Zicola ▴ 60

The problem may come from the fact that you are using non-human data. I am working with Arabidopsis and I also noticed the empty field for gene_name column in t_data.ctab file from StringTie. I thought it was a bug but if you look at this Ballgown documentation, you can see the description gene_name: HUGO gene name for the transcript, if known. HUGO annotation is restricted to human gene nomenclature. Hopefully, it will save time for someone else.