Question

Compare transcriptome between different species of Leishmania

0

Entering edit mode

3.6 years ago

Lissa Cruz Saavedra • 0

I would like to compare transcriptome data between different species of Leishmania that where expose to the same treatment. I have a reference genome from each one of them but I do not which could be the correct pipeline to follow specially because the genes names change between reference genome. How could be the correct way to analysis that kind of data?.

Thanks so much for you answers!

RNA-Seq gene • 933 views

ADD COMMENT • link updated 3.6 years ago by Dunois ★ 2.5k • written 3.6 years ago by Lissa Cruz Saavedra • 0

0

Entering edit mode

Did you also sequence the genome of these bacteria or just the transcriptome ?

ADD REPLY • link 3.6 years ago by hugo.avila ▴ 490

score 1 · Answer 1 · 2020-09-18

Hmm, I am not an expert in this at all, but I would probably try something like this:

Predict genes from the genomes (use something like GeneMark).
Pool all predicted genes from all your genomes together into a single fasta file, and cluster them at something like 90% coverage and 90% sequence identity (use MMseqs2 or CD-HIT; basically just find orthologs somehow). Inspect a bunch of the resulting clusters, and take all clusters that have at least one predicted gene sequence from each of your genomes. Then you can divvy up the clusters back to a species-genes fasta file, once you rename all the clusters' members in a systematic manner (so that you can identify your orthologs). Now you have a set of fasta files, one each for each species, that contain a set of gene sequences that have homologous counterparts in the other fasta files. (Note: I suppose you could just blast the genomes against one another but the clustering method is probably faster.)
You can then map your RNA-seq reads from each species to the sequences in the respective species-genes fasta file using an aligner of your liking. This will yield expression profiles for each of these species' genes. Then you should be able to compare the expression patterns among the various species directly.
The caveat here is that you'll not see be able to quantify and compare expression patterns for genes that do not have orthologs in one or more species. (There are probably ways to work around this, but that's the gist of it.)

I'm sure there are other (better) ways of doing this. I hope folks with more expertise than I have chime in with their suggestions.