I have to analyze 24 transcriptomes (TRM) in order to compare gene expression in different conditions of an animal, which genome has not been annotated. I thought about a multiple assembly followed by a co-assembly in order to build the "main" TRM. After quality filtering, I was thinking to:
1. Assemble the 24 libraries (they came from different treatments) using X different assemblers (i.e Trinity, velvet, ../ multiple K-mer, etc.). This will give me 'X' x 24 assemblies.
2. Merge together 'X' x 24 assemblies with a co-assembly tool (i.e. CD-HIT-EST or CORSET or CAP3). Therefore, I will end up having one main transcriptome (Main-TRM), representing the animal object of the study.
3. Performing functional annotation using the Main-TRM against SWISS-Prot, KEGG, GO ,etc. using blastx and blast-to-go.
4. Tacking the non-assembled quality filtered reads from the 24 libraries (before step 1, in order to retain the condition variable) and blast them individually against the annotated Main-TRM, having this way the expression information.
What do you guys think about this approach? Is it theoretically correct, if not what should I change?
Thanks a lot in advance,