Question

Comparative transcriptomics between species

0

Entering edit mode

5 months ago

Calidris • 0

Hey all,

I am new to the transcriptomics world, therefore I have some questions. I am currently working on a study where the goal is to compare transcriptomes across 5 species. I mapped all rna-seq to reference genomes (different for every species) using Hisat2, then ran stringtie, and got read counts using Salmon. I have a hard time understanding what is the best approach to find differentially expressed genes between species when there are 5 different reference genomes..

What I've done so far was creating fasta files from the bam and gtf files, running transdecoder and then using filtered longest ORFs to run orthofinder with a model organism. I've now created a matrix for all the single copy orthologues and corresponding read counts for every individual. So my question is if this is actually an appropriate way to analize data; and if so - how could I go about normalizing read counts so that the samples are comparable?

Thank you!

Transcriptomics rna-seq • 409 views

ADD COMMENT • link updated 5 months ago by cfos4698 ★ 1.1k • written 5 months ago by Calidris • 0

score 0 · Answer 1 · 2023-11-08

I faced a similar problem in a previous study of mine. A bit of a difference was that we looked for DE genes between conditions within each species, and then were interested in seeing if the same genes were DE between conditions in multiple species.

We aimed to compare the expression of the same genes in all species, but comparing species separated by such a great evolutionary distance raises issues of orthology. For example, the presence of unique genes in any lineage, or duplication of genes in any lineage(s), complicates such a comparison. Therefore, we searched all transcriptomes for groups of orthologous genes (“orthogroups”) using OrthoFinder v2.3.12 (Emms and Kelly 2015). For the purpose of orthology searching, we selected the transcript with the longest predicted peptide sequence for each gene. In total, we identified 48,684 orthogroups, including 5,591 orthologues that were single-copy in all eight species. This number of single-copy orthologues is fewer than would be expected from a genomic dataset because transcriptomic datasets only contain genes that are expressed in the target tissue at the time of tissue sampling.

Using the output of the OrthoFinder analysis, we mapped differentially expressed to their corresponding orthogroups. We then classed an orthogroup as differentially expressed within a species if any of its constituent genes were differentially expressed. This approach does not assume that all paralogues of a gene within a species will be differentially expressed, reflecting knowledge that expression profiles vary between gene copies (Kegel and Ryan 2019). We visualized patterns in our expression data using principal components analysis of variance-stabilization normalized expression data for all orthogroups for which all species have at least one gene represented.

https://doi.org/10.1093/molbev/msac077

There is no one recommended way to address these cross-species DE analyses AFAIK, so it's a brave new world. I suppose you could feed your matrix of counts for single-copy orthologous genes into (e.g.) DESeq2 along with transcript lengths and normalise as per the standard workflow. You could also try working with summed read counts within orthogroups too (like we tried), but there are obviously many assumptions involved.