Question: What Is Best Strategy To Use Miseq 2X300 Reads To Improve De Novo Transcriptome Assembly Based On Hiseq 2X100 Reads?
2
gravatar for lzwright
6.0 years ago by
lzwright150
NYC
lzwright150 wrote:

Hi all!

First a little background: I have done a de novo transcriptome assembly of marine snail venom duct tissue using HiSeq Illumina reads. I have 289 mil PE reads of 100bp length that I have digitally normalized prior to assembly. I have used Trinity and Velvet Oases as my assemblers, and have done extensive blasting of Trinity assembly to identify putative marine snail toxins and to obtain a high level overview of GO and KEGG terms for the rest of my hits. I am reasonably happy with results thus far, but as always with de novo assembly, am looking for ways to be sure my transcripts are valid, as I have no reference transcriptome.

Now, luckily, I have been able to obtain some 2x300 MiSeq data, approx 20 mil reads, generated from the HiSeq library. So I am asking how I can make best use of these to improve my assembly.

My current strategy is the following.

Do a separate assembly of the MiSeq reads with Trinity and perhaps VO to compare with HiSeq assemblies (side question: when reads get longer should this affect your choice of kmer values for assembler like VO?)

Map MiSeq reads to HiSeq assembly, especially to confirm transcripts I have identified as putative toxins from HiSeq assembly (NB: these are short, disulfide rich peptides, average length of precursor structure 100 AA)

Blast MiSeq raw reads against database of toxins (on the theory that some of these reads are going to be long enough to cover most if not all of some toxin transcripts)

I am wondering what other strategies might be useful to pursue, or if the above seems to make the best sense. For example, should I combine HiSeq and MiSeq together prior to assembly? I am open to any thoughts on how I can take advantage of the depth of the HiSeq along with the length of the MiSeq to obtain a better assembly.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by lzwright150
2

my 2 cents: combine digitally normalized hiseq and miseq reads and rerun the assembly

ADD REPLYlink written 6.0 years ago by Rm7.9k

This also sounds like a reasonable strategy to me. I'm doing something similar to it now with two separate sequencing runs (one from last year, one from last week). ...and what is VO?

ADD REPLYlink written 6.0 years ago by Josh Herr5.7k

VO = velvet oases (sorry got lazy). are you combining new and old reads before assembling?

ADD REPLYlink written 6.0 years ago by lzwright150

Going back to the original raw data sequence files and adding the new data to them. I might try to add on to the old assembly... Still running the new assembly with combined data.

ADD REPLYlink written 6.0 years ago by Josh Herr5.7k

agreed I think I will in fact try this.

ADD REPLYlink written 6.0 years ago by lzwright150

You should post this as an actual answer to the question below

ADD REPLYlink written 6.0 years ago by Chris Fields2.1k
1

Sorry for not answering the question, but I just wanted to mention that I think doing a transcriptome of a marine snail venom duct is very very cool.

ADD REPLYlink written 6.0 years ago by Daniel3.7k
0
gravatar for lzwright
6.0 years ago by
lzwright150
NYC
lzwright150 wrote:

agreed I think I will in fact try this.

ADD COMMENTlink written 6.0 years ago by lzwright150
2

Not an answer, lzwright! :)

ADD REPLYlink written 6.0 years ago by Josh Herr5.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1633 users visited in the last hour