Question: Will A Missing Base In 1/8Th Of The Reads Of A Sample Hamper De Novo Transcriptome Assembly?
gravatar for Dan D
7.1 years ago by
Dan D7.1k
Dan D7.1k wrote:

We recently performed a HiSeq 2000 PE-50 run for a customer. She had 15 samples of a species of invertebrate. These samples were each run on four lanes of a flowcell. Each sample had at least 80 million reads, with a mean quality score of no less than 36.7.

However, on cycle 14 of lane 1, for the top surface, there was an instrument malfunction where communication with the instrument camera was interrupted. As a result, all of the reads for that cycle of that lane were reported as "N." Therefore 1/8th of the reads for each sample have a value of "N" at cycle 14.

The customer believes that this missing cycle will cause problems with de novo transcriptome assembly. I've done plenty of genome assembly, but I don't have enough experience in transcriptome assembly to confidently disagree. Intuitively it seems that this missing base will be inconsequential to the final assembly.

assembly transcriptome • 1.4k views
ADD COMMENTlink modified 7.1 years ago by Fabio Marroni2.5k • written 7.1 years ago by Dan D7.1k
gravatar for Fabio Marroni
7.1 years ago by
Fabio Marroni2.5k
Fabio Marroni2.5k wrote:

It's difficult to say. I suspect it will not be a problem. You could do a simulation. Simulate perfect reads originating from a transcript and then substitute base 14 of 1/8th of the reads with N and see what happens. I imagine that for well covered genes this should not be an issue.

ADD COMMENTlink written 7.1 years ago by Fabio Marroni2.5k

Excellent suggestion. Thank you!

ADD REPLYlink written 7.1 years ago by Dan D7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1893 users visited in the last hour