5.9 years ago by
At least in my experience, I think full coverage of the coding region in a single assembled transcript is probably difficult to achieve. This is part of why I would always prefer a direct alignment over de novo assembly (when a reference is available). When working with assembled transcripts, I would favor using a partial contig as a proxy for expression of the relevant gene (rather than requiring a full coding region to be present in the assembly)..
Yes, you will have reads from UTRs. Just like the coding regions, my guess is that that long, high-quality (and not incorrectly stitched) contigs will not necessarily cover all the real UTRs as a contiguous extension of the coding region transcript.
If it helps, I've collected a list of pointer for a slightly different assembly question in this blog post.