Question

Annotating A Novel Transcriptome Assembly

5

Entering edit mode

12.0 years ago

Madelaine Gogol 5.3k

Any tips on what to do downstream of a novel transcriptome assembly? I have a few things going based mainly on someone else in my group's previous work:

basic stats about the contigs, length, etc.
check for redundancy in contigs
tabulate read counts per contig / create igv tracks
examine coding potential of contigs
blast against nr (or some subset?) to annotate contigs

I wanted to see if there are any additional things I should be doing. I assembled using trinity, the data is paired-end illumina sequences. I have one sample at this point.

I kind of liked this idea from seqanswers.

transcriptome assembly annotation trinity • 4.7k views

ADD COMMENT • link updated 12.0 years ago by Larry_Parnell 16k • written 12.0 years ago by Madelaine Gogol 5.3k

score 4 · Answer 1 · 2012-04-18

4

Entering edit mode

12.0 years ago

Larry_Parnell 16k

You may want to know the percentage of reads that are singletons or that do not make it into the assembly. I don't find much value in coding potential but more in assessing whether the read looks full-length or not. For this, you could have 3 classes: definitely full-length, definitely not full-length, and undetermined. If coding potential is really something you like, look into upstream ORFs (they likely curtail translation or slow it down).

Sequence similarity to annotate could be done against the genome(s) of the closest species. A set of reference mRNAs or proteins will do this. Rather than search across all of nr - if cannot focus to a few close relatives, I'd stick with a subset that focuses on a kingdom or family of interest - like plants or insects, as the case may be.

ADD COMMENT • link 12.0 years ago by Larry_Parnell 16k

0

Entering edit mode

Thanks, those are some good points.

ADD REPLY • link 12.0 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

You're welcome. With regard to full-length vs not, it might be interesting to look at differences in this for multi- vs single-exon genes, but only if you're real curious and like to show lots of data.

ADD REPLY • link 12.0 years ago by Larry_Parnell 16k