Hi all!
First a little background: I have done a de novo transcriptome assembly of marine snail venom duct tissue using HiSeq Illumina reads. I have 289 mil PE reads of 100bp length that I have digitally normalized prior to assembly. I have used Trinity and Velvet Oases as my assemblers, and have done extensive blasting of Trinity assembly to identify putative marine snail toxins and to obtain a high level overview of GO and KEGG terms for the rest of my hits. I am reasonably happy with results thus far, but as always with de novo assembly, am looking for ways to be sure my transcripts are valid, as I have no reference transcriptome.
Now, luckily, I have been able to obtain some 2x300 MiSeq data, approx 20 mil reads, generated from the HiSeq library. So I am asking how I can make best use of these to improve my assembly.
My current strategy is the following.
Do a separate assembly of the MiSeq reads with Trinity and perhaps VO to compare with HiSeq assemblies (side question: when reads get longer should this affect your choice of kmer values for assembler like VO?)
Map MiSeq reads to HiSeq assembly, especially to confirm transcripts I have identified as putative toxins from HiSeq assembly (NB: these are short, disulfide rich peptides, average length of precursor structure 100 AA)
Blast MiSeq raw reads against database of toxins (on the theory that some of these reads are going to be long enough to cover most if not all of some toxin transcripts)
I am wondering what other strategies might be useful to pursue, or if the above seems to make the best sense. For example, should I combine HiSeq and MiSeq together prior to assembly? I am open to any thoughts on how I can take advantage of the depth of the HiSeq along with the length of the MiSeq to obtain a better assembly.
my 2 cents: combine digitally normalized hiseq and miseq reads and rerun the assembly
This also sounds like a reasonable strategy to me. I'm doing something similar to it now with two separate sequencing runs (one from last year, one from last week). ...and what is VO?
VO = velvet oases (sorry got lazy). are you combining new and old reads before assembling?
Going back to the original raw data sequence files and adding the new data to them. I might try to add on to the old assembly... Still running the new assembly with combined data.
agreed I think I will in fact try this.
You should post this as an actual answer to the question below
Sorry for not answering the question, but I just wanted to mention that I think doing a transcriptome of a marine snail venom duct is very very cool.