Will A Missing Base In 1/8Th Of The Reads Of A Sample Hamper De Novo Transcriptome Assembly?
Entering edit mode
9.6 years ago
Dan D 7.3k

We recently performed a HiSeq 2000 PE-50 run for a customer. She had 15 samples of a species of invertebrate. These samples were each run on four lanes of a flowcell. Each sample had at least 80 million reads, with a mean quality score of no less than 36.7.

However, on cycle 14 of lane 1, for the top surface, there was an instrument malfunction where communication with the instrument camera was interrupted. As a result, all of the reads for that cycle of that lane were reported as "N." Therefore 1/8th of the reads for each sample have a value of "N" at cycle 14.

The customer believes that this missing cycle will cause problems with de novo transcriptome assembly. I've done plenty of genome assembly, but I don't have enough experience in transcriptome assembly to confidently disagree. Intuitively it seems that this missing base will be inconsequential to the final assembly.

assembly transcriptome • 1.7k views
Entering edit mode
9.6 years ago
Fabio Marroni ★ 2.9k

It's difficult to say. I suspect it will not be a problem. You could do a simulation. Simulate perfect reads originating from a transcript and then substitute base 14 of 1/8th of the reads with N and see what happens. I imagine that for well covered genes this should not be an issue.

Entering edit mode

Excellent suggestion. Thank you!


Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6