What Is Best Strategy To Use Miseq 2X300 Reads To Improve De Novo Transcriptome Assembly Based On Hiseq 2X100 Reads?
1
2
Entering edit mode
11.0 years ago
lzwright ▴ 150

Hi all!

First a little background: I have done a de novo transcriptome assembly of marine snail venom duct tissue using HiSeq Illumina reads. I have 289 mil PE reads of 100bp length that I have digitally normalized prior to assembly. I have used Trinity and Velvet Oases as my assemblers, and have done extensive blasting of Trinity assembly to identify putative marine snail toxins and to obtain a high level overview of GO and KEGG terms for the rest of my hits. I am reasonably happy with results thus far, but as always with de novo assembly, am looking for ways to be sure my transcripts are valid, as I have no reference transcriptome.

Now, luckily, I have been able to obtain some 2x300 MiSeq data, approx 20 mil reads, generated from the HiSeq library. So I am asking how I can make best use of these to improve my assembly.

My current strategy is the following.

Do a separate assembly of the MiSeq reads with Trinity and perhaps VO to compare with HiSeq assemblies (side question: when reads get longer should this affect your choice of kmer values for assembler like VO?)

Map MiSeq reads to HiSeq assembly, especially to confirm transcripts I have identified as putative toxins from HiSeq assembly (NB: these are short, disulfide rich peptides, average length of precursor structure 100 AA)

Blast MiSeq raw reads against database of toxins (on the theory that some of these reads are going to be long enough to cover most if not all of some toxin transcripts)

I am wondering what other strategies might be useful to pursue, or if the above seems to make the best sense. For example, should I combine HiSeq and MiSeq together prior to assembly? I am open to any thoughts on how I can take advantage of the depth of the HiSeq along with the length of the MiSeq to obtain a better assembly.

miseq hiseq illumina qualitycontrol • 6.7k views
ADD COMMENT
2
Entering edit mode

my 2 cents: combine digitally normalized hiseq and miseq reads and rerun the assembly

ADD REPLY
0
Entering edit mode

This also sounds like a reasonable strategy to me. I'm doing something similar to it now with two separate sequencing runs (one from last year, one from last week). ...and what is VO?

ADD REPLY
0
Entering edit mode

VO = velvet oases (sorry got lazy). are you combining new and old reads before assembling?

ADD REPLY
0
Entering edit mode

Going back to the original raw data sequence files and adding the new data to them. I might try to add on to the old assembly... Still running the new assembly with combined data.

ADD REPLY
0
Entering edit mode

agreed I think I will in fact try this.

ADD REPLY
0
Entering edit mode

You should post this as an actual answer to the question below

ADD REPLY
1
Entering edit mode

Sorry for not answering the question, but I just wanted to mention that I think doing a transcriptome of a marine snail venom duct is very very cool.

ADD REPLY
0
Entering edit mode
11.0 years ago
lzwright ▴ 150

agreed I think I will in fact try this.

ADD COMMENT
2
Entering edit mode

Not an answer, lzwright! :)

ADD REPLY

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6