I have sequences from several tissues of the same animal. I'd like to generate a reference transcriptome to then map my reads onto a search for differential expression. There is no genome for this animal, not even anything close.
The most obvious strategy would be to assemble each tissue de novo, then combine them and remove duplicate sequences. Is there any reason why this would not be the best way?
Does anyone know of a data structure or program that could include one "gene" and all exon combinations for mapping, so I could clearly see that reads are mapping to splice variants and not see it as mapping to possibly unrelated contigs? For example, a gene with 3 exons (1,2,3) might have two transcripts (isoform A: 1+2, isoform B:1+2+3). While the first is a subsequence of the second, I don't want to remove the first since the inclusion of the c-terminal exon might be biologically important in one of the tissues. If I were to then map the reads with bowtie, some of them would hit isoform B and some to A. Since they are the same gene, at some level I just would want to know that, and could possibly disregard the cassette exons.