Annotating A Novel Transcriptome Assembly
1
5
Entering edit mode
12.0 years ago

Any tips on what to do downstream of a novel transcriptome assembly? I have a few things going based mainly on someone else in my group's previous work:

  • basic stats about the contigs, length, etc.
  • check for redundancy in contigs
  • tabulate read counts per contig / create igv tracks
  • examine coding potential of contigs
  • blast against nr (or some subset?) to annotate contigs

I wanted to see if there are any additional things I should be doing. I assembled using trinity, the data is paired-end illumina sequences. I have one sample at this point.

I kind of liked this idea from seqanswers.

transcriptome assembly annotation trinity • 4.7k views
ADD COMMENT
4
Entering edit mode
12.0 years ago

You may want to know the percentage of reads that are singletons or that do not make it into the assembly. I don't find much value in coding potential but more in assessing whether the read looks full-length or not. For this, you could have 3 classes: definitely full-length, definitely not full-length, and undetermined. If coding potential is really something you like, look into upstream ORFs (they likely curtail translation or slow it down).

Sequence similarity to annotate could be done against the genome(s) of the closest species. A set of reference mRNAs or proteins will do this. Rather than search across all of nr - if cannot focus to a few close relatives, I'd stick with a subset that focuses on a kingdom or family of interest - like plants or insects, as the case may be.

ADD COMMENT
0
Entering edit mode

Thanks, those are some good points.

ADD REPLY
0
Entering edit mode

You're welcome. With regard to full-length vs not, it might be interesting to look at differences in this for multi- vs single-exon genes, but only if you're real curious and like to show lots of data.

ADD REPLY

Login before adding your answer.

Traffic: 2955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6