Question

Differential gene expression analysis between species using methods involving read alignment to transcriptome

0

Entering edit mode

5 hours ago

arsala521 ▴ 60

Hi, My question is about RNA-seq data analysis, particularly differential gene expression analysis between different species.

I have RNA-seq reads for a tissue type from human and chimpanzee and I need differentially expressed genes (DEGs). These are the three sequential steps I am following:

Using Kallisto: to align read to transcriptome and get transcript counts

Using tximp: to convert transcript counts to gene counts

Using DEseq: to get DEGs based on gene count

I am getting the latest release of the transcriptome file (cdna.fa) for human and chimpanzee genome, but the number of transcripts in the human cdna.fa file is ~200,000 and in chimpanzee cdna.fa file is ~50,000. I think it is because the human genome annotations are more advanced. My question is if this difference will lead to higher gene counts for humans and thus impact the determination of DEGs. I am asking this question because tximp (and other transcript count to gene count converters) sums the counts of all transcripts to get the gene count.

I think that the difference in the number of transcripts between human and chimpanzee won’t lead to higher gene counts for humans and won’t impact determining DEGs (because looking at how kallisto works tells me if the read is pseudo-aligning to more than one transcripts, Kallisto distributes the count of that read among the transcripts and not give whole count to all the transcripts), but I want to double check.

RNA-seq kallisto tximp DEGs • 52 views

ADD COMMENT • link updated 54 minutes ago by dsull ★ 7.7k • written 5 hours ago by arsala521 ▴ 60

score 0 · Answer 1 · 2025-10-30

No, you won’t get higher “gene counts” just because there are fewer annotated transcripts (and even if you did, it wouldn’t matter because gene expression analysis usually involves normalizing the data, e.g. by sequencing depth, anyways). And yes, the counts are distributed among transcripts.