I am doing de novo transcriptome assembly of RNA-Seq data from two closely-related diploid species (mammals) for the purpose of identifying genetic variations between the two species. In order to do this, I suppose I need to identify pairs of ortholog transcripts between the two assemblies, so that I can compare them. What is the best way to do this? Should I simply do all pairwise alignments and pick out the pairs that are best matches to each other? Are there tools available for this already?
Additionally ,how does the presence of heterozygous SNPs affect the strategy? I am using Trinity for the transcriptome assembly, and my understanding is that when a transcript has a heterozygous SNP, Trinity will end up reporting two complete contigs that are identical except for the SNP. For example, if the transcript is "TTTTTTTTTT" and there is a heterozygous A/T at position 6, then Trinity would report "TTTTTTTTTT" and "TTTTTATTTT". This could potentially complicate the identification of ortholog pairs by a "mutual best match" strategy described above.
Phrap and cap3 are good choices to assemble contigs. You would not, in my mind, want to mix reads from different species for mRNA assembly and ortholog identification.
I didn't made myself clear. What I meant was to use de novo and phrap/cap3 to assemble just ONE of them, making a solid reference transcriptome. Then if the other one is close enough, maybe it's feasible to map the reads using the first one as reference.
Yes, a much clearer approach. I could agree to give that a try.