Question: Aligning Short Reads To Homologous Sequences?
gravatar for Goldbear
7.9 years ago by
Goldbear130 wrote:

I'm working on a de novo transcriptome project and the genome has not been sequenced for our plant. In general, trinity is working pretty well, but we are seeing some spurious fusion genes (N terminus of gene A fused to C terminus of gene B). For the questions we are asking, having the correct full-length protein sequence is important.

I was wondering if we could get around this problem by first mapping short reads to homologous protein sequences from Arabidopsis and then feeding each of those subcollections into a contig builder? Is there any software that acts like blastx but works well on short Illumina reads?


short aligner transcriptome • 1.9k views
ADD COMMENTlink modified 5.5 years ago by rob234king570 • written 7.9 years ago by Goldbear130

Hi, length of the reads and the divergence between your species and Arabidopsis will be very useful to know first.

ADD REPLYlink written 7.9 years ago by Haibao Tang3.0k

Hi, my reads are about 90bp. unfortunately the divergence is pretty far. Our strategy may be to fully sequence one representative species from our clade and then use that as a reference for the other species. Thanks!

ADD REPLYlink written 7.8 years ago by Goldbear130
gravatar for Vitis
7.8 years ago by
New York
Vitis2.1k wrote:

There is a blastx-based pipeline called STM (a Genome Research paper) which uses de novo contigs and blastx search results to proteins from a closely related species to improve de novo assembly. Maybe it's also not hard to put together a similar pipeline using perl, in which blastx outputs can be dissected more closely. However, I think choosing the right reference taxon should be a very important factor.

ADD COMMENTlink written 7.8 years ago by Vitis2.1k

Thanks! This was exactly what I was looking for.

ADD REPLYlink written 7.8 years ago by Goldbear130
gravatar for rob234king
5.5 years ago by
UK/Harpenden/Rothamsted Research
rob234king570 wrote:

There is a setting on trinity to attempt to address fusion transcripts, try again but use --jaccard_clip

ADD COMMENTlink written 5.5 years ago by rob234king570
