Question: Problem with demultiplexing illumina dual indexed libraries
gravatar for sikhtechai
7 months ago by
sikhtechai30 wrote:


I am currently facing a problem regarding demultiplexing my dual indexed reads run on Hiseq 2000. Basically, the Lane summary looks like this: Whole flow cell summary

So, each lane, having around 230 million reads. Sequencing went fine. I have put 3 samples per lane, to have around 70 million reads per sample. But I get almost half of that. I noticed, the undetermined reads are quite high. For example, Undetermined read statistics

Above is the statistics of undetermined reads. They are covering almost 30,40 sometimes 60% of the flow cell. Basically, we did not allow any mistmatch while running the bcl2fastq to generate fastq files. But in those undetermined reads, I have many cases where the i7 index is fine, but the i5 index has one or two mismatches. An example is below: Undetermined reads each lane with barcode sequences

I used illumina CD indices for making the libraries(a.k.a. HT index, i7: D701,702,703...i5: D501,502,503 etc).Basically, in this case, I do not care about the i5 indices as they are the same for all 3 samples per lane. For example, Sample 1: D701-D502, Sample 2: D702-D502, Sample 3: D703-D502. This is how I ran lane 1. Similar also to Lane 2, 3 and so on... Therefore, my question is the following:

Is it possible to run bcl2fastq that will demultiplex bcl file based on only index 1(i7), although the sample were prepared with dual indexing? Or is there any better way to do demultiplexing in this situation to get more reads?

I would really appreciate if anyone can help.

ADD COMMENTlink modified 7 months ago by Gabriel R.2.6k • written 7 months ago by sikhtechai30

This is an excellent example of why one should never use the same index for all samples (either in first or second location).

ADD REPLYlink written 7 months ago by genomax74k
gravatar for swbarnes2
7 months ago by
United States
swbarnes27.0k wrote:

Just omit index 2 from your sample sheet, and redo the demultiplexing.

The software that calls bases from clusters flips out when the entire flowcell lights up for a single base. That leads to N's in that index, and the demultiplexing software is bound and determined to use that awful index2 if you tell it to do so, and then the read fails demutiplexing because of the N's.

So just drop that from the sample sheet. bcl2fastq will not mind.

Basically, we did not allow any mistmatch while running the bcl2fastq to generate fastq files.

That's probably too stringent. Most index sets have been designed to be robust to a single error. Take advantage of that, let bcl2fastq run with the default setting of one mismatch. The software will tell you if the indices you have won't support that much flexibility.

ADD COMMENTlink modified 7 months ago • written 7 months ago by swbarnes27.0k
gravatar for Gabriel R.
7 months ago by
Gabriel R.2.6k
Center for Geogenetik Københavns Universitet
Gabriel R.2.6k wrote:

I am biased but I would recommend my own deML. It is a maximum-likelihood demultiplexer which is designed to deal with uncertainty and partial information. I still maintain it, let me know if you have any issues.

ADD COMMENTlink written 7 months ago by Gabriel R.2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 660 users visited in the last hour