Question: differential expression analysis of RNASeq data in different species
5.4 years ago
United States
I am fairly new to this sort of analysis, and have a question - I am picking up a project where they are comparing expression in two fairly closely related nematode genera.  We have RNASeq (Illumina) data from the two taxa.  When the project was begun, only one of the worms had a published genome.  They used Tophat/Cufflinks with that genome to analyze the RNASeq data for both genera.  

It seems, at least to my novice eyes,  that using the reference genome of one taxon to analyze could be problematic - is Tophat able to map reads from one taxon to the genome of another correctly?  

There is now a genome for the second taxon, and I can use that to map the reads in Tophat, but how, then, would I compare the output of the two taxa, to find genes that are differentially expressed?  Is it possible to build a reference that combines the genomes of both taxa, and somehow pulls orthologs that are found in both?

5.4 years ago
Istvan Albert
University Park, USA
In my opinion the ideal course of action would be to characterize each transcriptome independently and establish groups of genes that appear to express at different levels within the genomes.

Then compare these groups of genes between one another either by similarity of their sequences or some their known functionality.

The other options are far less attractive because aligning data from one species to the the genome of the other may produce more severe artifacts.

That being said most of the time success in publishing the results depend on the strength of the results and your ability to come up with an interesting finding, so if one method does not seem to work explore the other.

Thank you!  Am I correct in thinking there isn't a gold standard method for this sort of analysis?  I was looking for papers, but didn't see much out there...

frankly there is no gold standard even for straight up RNA-Seq analysis - there are only methods that are acceptable (today) yet do not guarantee finding the ground truth. 

but moreso there can't really be a standard analysis for your case as the dissimilarity between species can be extremely variable and unexpected. I would try both methods and see what works best then make a case for it.


Thanks for your input!

5.4 years ago
Manvendra Singh
Berlin, Germany
I think for your first question you follow  Istvan Albert 

for cross taxa mapping, you can

map the reads on both genomes (two taxa of nematode)

. take only those reads for further analysis which mapped on both genomes, for the estimation of gene expression in their respective species. 

may be you consider uniquely mapped reads or one allignment per read which are mapped on both genomes

this way the biasness of insertions and deletions of sequences between the genomes are removed and moreover, you get orthologous regions from your cross-taxa mapped reads.

