Question: RNA-Seq analysis across different species
gravatar for Nicolas Rosewick
4.8 years ago by
Belgium, Brussels
Nicolas Rosewick8.6k wrote:


I have to analyze RNA-Seq data from multiple species (human, bovine,...) and different cell types, to detect differentially expressed genes (DEGs) and to select common DEGs between species. I've two ideas in mind :

1. align each species to its related genome, and count the number of reads per gene using its related annotation (e.g. ENSEMBL). Then use DESeq2 to assess differential expression.

2. Or align each sample against a common transcriptome (typically the human transcriptome in order to use post-hoc analysis such as GO enrichment analysis)

What do you think ? advices ?


rna-seq species • 4.1k views
ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 4.8 years ago by Nicolas Rosewick8.6k

Not sure your second strategy would work. I recently did some simulations to see how many RNA-seq reads from fly would map to the mouse mm10 reference. Answer: less than 1% of fly reads will map to mouse.

ADD REPLYlink written 4.8 years ago by Ryan Dale4.9k

I'd align each species to their respective genomes and map genes between species through orthology. Without establishing orthology you can't be sure you're comparing the same genes, if you're not comparing the same genes you're not calculating relevant DE values. You could be comparing two genes with different functions.

ADD REPLYlink written 4.8 years ago by pld4.8k

Hi NicoBxl,

Did you find any solution for this. Im also going to do a similar type of analysis

ADD REPLYlink written 4.4 years ago by ifudontmind_plzz150
gravatar for Manvendra Singh
4.8 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

This is what I would for RNA-seq

1. map the reads on both genomes (human and bovine)

2. take only those reads for further analysis which mapped on both genomes (biasness of insertions and deletions of sequences between the genomes are removed and moreover, you get orthologous regions from your reads)

3. now count the reads over gene features (featureCount)and remove those genes which has low counts in all samples. ( you would loose lot of them)

4. assign mean of counts over different tratnscripts to their respective gene, transform it on log scale.

5. now you have rownames as your genes colnames as your samples, now merge both species data into one dataframe

6. normalize them by their quantiles or surrogate variances.

7. calculate relative expression of each gene across the sample ( assign the relative value to the rowmeans to each gene of each sample)

8. calculate spearman's correlation between the samples, and see which of them are forming clusters.

9. If they are clustering expectantly then go for Differetially expressed genes



ADD COMMENTlink written 4.8 years ago by Manvendra Singh2.1k
gravatar for cyril-cros
4.8 years ago by
cyril-cros890 wrote:

I am not an expert on differential expression, but DESeq2 makes the assumption that the expression of a majority of genes stay comparable. This won't be always true in your case...

For the second proposition, well, I never considered you could align transcripts to the genome of a different species. I don't know if it is possible at all, and would appreciate an answer.

For comparing different cell types, you can try a relative quantification approach: get a few genes with similar expression across all your species and cell types and use them to normalize the others. I am thinking about RT-qPCR here, which is quite precise but only works on one or two dozen target genes at most per run.....

You may also consider restricting yourself to a few GO terms before analyzing the expression levels, if you have some expectations.


ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by cyril-cros890
gravatar for Adamc
4.8 years ago by
United States
Adamc640 wrote:

If possible, do a comparison of the cell types within each species, and then do a meta-analysis of those DE lists across species. That way you're not directly comparing samples from different species.

Alternately you can try to get a sense of at least what genes are expressed at all by using something like the "UPC" functionality of SCAN.UPC (Bioconductor package) which operates on each sample individually and provides a value on a 0-1 scale indicating confidence in a given transcript having expression or not- you can threshold those values to get something like a present/absent call. I've also used the values to get a sense of what genes with known homology in human & mouse are expressed in certain cell types/tissues.

ADD COMMENTlink written 4.8 years ago by Adamc640
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour