why count_matrix.csv generated by "prepDE.py3" showed unidentified IDs
Entering edit mode
7 weeks ago
Pegasus ▴ 80

Hi everyone,

I carried out a transcript assembly on my RNA-seq data using StringTie. The transcripts were mapped to a bacterial genome reference using bowtie2, and merged using the "prepDE.py3" script in Python3. This generated two output files: the gene_count_matrix and the transcript_count_matrix, which contain count data for genes and transcripts, respectively.

However, I encountered two issues:

  1. All gene IDs in the gene_count_matrix are labeled as "MSTRG". I am unsure if this labeling will affect the downstream analysis, as this issue has been raised before.

enter image description here

  1. All transcript IDs in the transcript_count_matrix are labeled as "GOHBADNI". This ID name appears to be customized, however, I analyzed my own data using my mac terminal and linux, and so I'm not sure where this ID name came from.

enter image description here

The IDs in the gene_count_matrix and transcript_count_matrix files don't match and unsure whether this could affect the downstream differential gene expression analysis (Ballgown, Deseq2, edgR)

I would greatly appreciate any help in resolving these issues.

Stringtie RNA-seq • 147 views

Login before adding your answer.

Traffic: 1185 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6