Question

why count_matrix.csv generated by "prepDE.py3" showed unidentified IDs

0

Entering edit mode

15 months ago

Pegasus ▴ 100

Hi everyone,

I carried out a transcript assembly on my RNA-seq data using StringTie. The transcripts were mapped to a bacterial genome reference using bowtie2, and merged using the "prepDE.py3" script in Python3. This generated two output files: the gene_count_matrix and the transcript_count_matrix, which contain count data for genes and transcripts, respectively.

However, I encountered two issues:

All gene IDs in the gene_count_matrix are labeled as "MSTRG". I am unsure if this labeling will affect the downstream analysis, as this issue has been raised before.

enter image description here

All transcript IDs in the transcript_count_matrix are labeled as "GOHBADNI". This ID name appears to be customized, however, I analyzed my own data using my mac terminal and linux, and so I'm not sure where this ID name came from.

enter image description here

The IDs in the gene_count_matrix and transcript_count_matrix files don't match and unsure whether this could affect the downstream differential gene expression analysis (Ballgown, Deseq2, edgR)

I would greatly appreciate any help in resolving these issues.

Stringtie RNA-seq • 406 views

ADD COMMENT • link 15 months ago by Pegasus ▴ 100