I recently de-novo assembled a non-model fungal genome using PacBio and Illumina reads, but now we have new RNA-seq data that needs to be assembled. I figured the best way to assemble the transcriptome would be to use a genome guided assembler like Trinity...unless you guys recommend a different assembler and/or a different approach. Also, I have the genome as both a masked, and unmasked assembly, so which one would be best to use for assembling the transcriptome?
Thanks in advance! Morgan
There's far better experts on this field than me, but why wouldn't you use the genome that you meticulously assembled as reference for mapping the RNAseq reads? IMHO this should be at least as accurate as the assembly. You can use the mapping information to generate a new or improve an existing gene model, for example with augustus.
Okay, let me see if I understand what you are saying. So you suggest instead of using the reference genome to assemble the RNAseq reads, I would just map the raw reads to the reference genome. If that is the case, then what is the difference I guess in mapping the reads vs. assembling them? Is there a difference?
Maybe it also makes a difference based on the research question. The transcriptome was sequenced for a separate project, in which we wanted to test the differences in genes expressed over time and over different conditions. So if you want to do gene expression analysis, does it matter if you map the reads vs assemble them?
I hope this makes sense...haha
So I think I was able to clear up what I was confused about with the mapping versus assembly approach. So since we have a reference genome, there is no need to do an assembly. My only remaining question is if you are supposed to map the RNA sequence reads to the repeat-masked genome assembly or the unmasked assembly?
By doing both you will learn the differences.