The best way to analyze RNA-seq of multiple tree species in the same genus (with and without ref-genome)
1
0
Entering edit mode
3.9 years ago
User 4014 ▴ 40

Dear Biostars,

I am working on a project to compare gene expression between four tree species A, B, C and D. They belong to the same genus, but the phylogeny suggests B and C are from the same clade. A and D are from two different clades. For species A there is a ref genome (ca. 90,000 scaffolds), but de-novo transcriptome assembly is needed for B, C and D. For all species, there are 2 treatments (control, treatment) with 3 biological replicates for each treatment.

I used Trinity to assembly B, C and D individually and mapping rates with Salmon is good (>95%) for all species. I am thinking whether I should try genome-guided assembly using the genome of species A. Given they are from different clades, do you think would it be a problem?

I would also like to assess orthology between transcripts from the four species. One way I can think about is to use edgeR (exact test) calling differentially expressed genes (DEGs) for each species individually and then use Orthofinder to find ortholog groups for all-species DEGs comparison. The other is to cluster all transcripts/genes (from all species) and bringing them to DEG analysis altogether although I do not know how difficult the cluster of 600,000+ transcripts generated from Trinity will be and how complicated the analysis will be in R. At this point, I am leaning toward the first option, but I am inexperience in this kind of analysis. Could you give me some directions on how to implement this appropriately, please?

Thanks and looking forward to hearing your suggestions.

RNA-Seq rna-seq R • 1.2k views
ADD COMMENT
0
Entering edit mode
3.9 years ago

There is certainly no harm in trying to do genome guided assembly (perhaps to complement the de novo assembly), it might give you additional genes missed in the de novo assembly and it could also remove potential redundancy compared to the only de novo approach.

OrthoFInder is a good choice when looking for orthologs. I would do that expression independent though. Do an orthoFinder analysis and later on you can "combine' that result with the expression result. For 600K genes you might indeed need some compute power but nothing super spec-ed though ( a moderated laptop might not cut it for example).

Considering all 600K genes as a single proteome/genome might not be the best of ideas as this could seriously mess up your statistics

ADD COMMENT
0
Entering edit mode

Thank you very much for your comment. :) Just to be sure, I should do (1) DGE analysis for each species individually, (2) combine DGEs from each species together and run Othofinder and (3) matching such DGEs with the expressions from (1). Am I correct?

Also do you have any other suggestions on what I should be aware of when dealing with comparison of transcriptomes from multiple species?

ADD REPLY
0
Entering edit mode

not quite I think.

1) yes run DGE analysis in each species. 2) run orthoFinder using all genes from all species (== will give you groups of genes that are homologous/orthologous/paralogous ) 3) use the orthofinder result to link genes from species A to B to C ... 4) using info from step 3 you can assign DGE to the different clusters.

Critical step here is to be able to correctly assign genes as orthologs from each other. As gene IDs from the different species will differ you will need to link them based on sequence content (== the orthofinder step)

ADD REPLY
1
Entering edit mode

I think I get it now. Thanks so much .. :)

ADD REPLY
0
Entering edit mode

A small educational note: if an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. (and you can accept multiple answers if need-be) . Please go through your previous posts as well and resolve them adequately (if applicable) . thx

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6