I am working on a comparative transcriptomics project and have a few logistics questions.
I have 4 transcriptomes (paired end Illumina) that I have been working with since last summer. I completed a combined reference assembly of these four transcriptomes (using ABySS, CD-Hit, Scaffolding and a few other steps). I then generated read counts using RSEM and then completed differential expression analysis using Bioconductor's DESeq package.
A few weeks ago, my lab recieved the 4kb draft genome for the species that I am working with. It's not the finalized version, and there is still work being done to obtain a better quality genome. My main question is, should I be re-running my DE analysis using the draft genome versus the combined transcriptomes? I know that this is typically how DE analysis is done when you have a reference genome available. My concern is that since it is still only at the draft genome stage, the combined reference transcriptomes might be more accurate to use still.
My second question is: Is there a way for me to determine which method is more accurate in this case?
My initial thought was to align the combined transcriptomes to the draft genome to get an idea of the coverage. If the coverage was relatively high, then I would continue to use the DE analysis that I have already completed. Does this make sense to do? I have the output from GMAP and have been viewing it in IGV, but I'm not sure how to quantify the results of the alignment.
The last thing that I have a question about is annotation-related. My DE analysis was completed on the transcripts, not on predicted genes. My problem is that I'm not sure how to annotate my results from the DE analysis without using something like Blast2Go (which is soooo slow). Any suggestions would be greatly appreciated!
Sorry for the abundance of questions!
Thanks for the help!