Question

What Is Best Strategy To Use Miseq 2X300 Reads To Improve De Novo Transcriptome Assembly Based On Hiseq 2X100 Reads?

2

Entering edit mode

11.0 years ago

lzwright ▴ 150

Hi all!

First a little background: I have done a de novo transcriptome assembly of marine snail venom duct tissue using HiSeq Illumina reads. I have 289 mil PE reads of 100bp length that I have digitally normalized prior to assembly. I have used Trinity and Velvet Oases as my assemblers, and have done extensive blasting of Trinity assembly to identify putative marine snail toxins and to obtain a high level overview of GO and KEGG terms for the rest of my hits. I am reasonably happy with results thus far, but as always with de novo assembly, am looking for ways to be sure my transcripts are valid, as I have no reference transcriptome.

Now, luckily, I have been able to obtain some 2x300 MiSeq data, approx 20 mil reads, generated from the HiSeq library. So I am asking how I can make best use of these to improve my assembly.

My current strategy is the following.

Do a separate assembly of the MiSeq reads with Trinity and perhaps VO to compare with HiSeq assemblies (side question: when reads get longer should this affect your choice of kmer values for assembler like VO?)

Map MiSeq reads to HiSeq assembly, especially to confirm transcripts I have identified as putative toxins from HiSeq assembly (NB: these are short, disulfide rich peptides, average length of precursor structure 100 AA)

Blast MiSeq raw reads against database of toxins (on the theory that some of these reads are going to be long enough to cover most if not all of some toxin transcripts)

I am wondering what other strategies might be useful to pursue, or if the above seems to make the best sense. For example, should I combine HiSeq and MiSeq together prior to assembly? I am open to any thoughts on how I can take advantage of the depth of the HiSeq along with the length of the MiSeq to obtain a better assembly.

miseq hiseq illumina qualitycontrol • 6.7k views

ADD COMMENT • link 11.0 years ago by lzwright ▴ 150

2

Entering edit mode

my 2 cents: combine digitally normalized hiseq and miseq reads and rerun the assembly

ADD REPLY • link 11.0 years ago by Rm 8.3k

0

Entering edit mode

This also sounds like a reasonable strategy to me. I'm doing something similar to it now with two separate sequencing runs (one from last year, one from last week). ...and what is VO?

ADD REPLY • link 11.0 years ago by Josh Herr 5.8k

0

Entering edit mode

VO = velvet oases (sorry got lazy). are you combining new and old reads before assembling?

ADD REPLY • link 11.0 years ago by lzwright ▴ 150

0

Entering edit mode

Going back to the original raw data sequence files and adding the new data to them. I might try to add on to the old assembly... Still running the new assembly with combined data.

ADD REPLY • link 11.0 years ago by Josh Herr 5.8k

0

Entering edit mode

agreed I think I will in fact try this.

ADD REPLY • link 11.0 years ago by lzwright ▴ 150

0

Entering edit mode

You should post this as an actual answer to the question below

ADD REPLY • link 11.0 years ago by Chris Fields ★ 2.2k

1

Entering edit mode

Sorry for not answering the question, but I just wanted to mention that I think doing a transcriptome of a marine snail venom duct is very very cool.

ADD REPLY • link 11.0 years ago by Daniel ★ 4.0k

score 0 · Answer 1 · 2013-10-18

0

Entering edit mode

11.0 years ago

lzwright ▴ 150

agreed I think I will in fact try this.

ADD COMMENT • link 11.0 years ago by lzwright ▴ 150

2

Entering edit mode

Not an answer, lzwright! :)

ADD REPLY • link 11.0 years ago by Josh Herr 5.8k