Dual index barcode demultiplex issue
1
2
Entering edit mode
4.5 years ago
BioRyder ▴ 220

Hello, Following are the list of dual indexed illumina libraries we have used for illumina run.But we were not able demultiplex the data by using the barcodes.Because 99% of the data going as undetermined.

Index1 ------ Index2 ( case1 )

ATGAGC ---- TGAACCTT

ATTCCT ---- TGAACCTT

CAAAAG ---- TGAACCTT

CAACTA ---- TGAACCTT

CACCGG ---- TGAACCTT

There after we tried demultiplex with reverse compliment of index 2 and No changes in Index1(list are below ).This worked well and able to determine 98% of the data.The question is why the demultiplex only worked with reverse compliment of index2 and not in the case1(without reverse compliment)

Index1 ------ Index2 (reverse compliment)

ATGAGC ---- AAGGTTCA

ATTCCT ---- AAGGTTCA

CAAAAG ---- AAGGTTCA

CAACTA ---- AAGGTTCA

CACCGG ---- AAGGTTCA

bcltofastq Demultiplex dual index barcod hiseq4000 • 7.5k views
0
Entering edit mode

Thank you so much @Devon Ryan.Your answer is more clear and correct.I went through the illumina doc, which you mentioned, and it mentioned as below.Hope fully it will help others in future.Here is the illumina conclusion:

Sequencing on the MiniSeq, NextSeq, and HiSeq 3000/4000 systems follow a different dual-indexing workflow than other Illumina systems, which requires the reverse complement of the i5 index adapter sequence. • If you are creating a sample sheet manually for the MiniSeq, NextSeq, or HiSeq 3000/4000 systems, include the reverse complement of the sequence on your sample sheet. • If you are using the Illumina Experiment Manager (IEM), BaseSpace Prep tab, or Local Run Manager to record the adapter sequences, the software creates the reverse complement automatically.

10
Entering edit mode
4.5 years ago

The index2 (i5) sequence orientation is dependent on the sequencer used. Your original i5 sequence for barcode A501 is appropriate for a HiSeq 2000/2500, MiSeq, or NovaSeq, but would need to be reverse-complemented for MiniSeq, NextSeq, or HiSeq 3000/4000 (you can see this in the following PDF from Illumina). This turns out to be rather annoying if you run a sequencing facility with multiple types of sequencers and people provide the barcodes they use. You then either have to check the orientation manually (that's annoying) or what for failures like this and reverse-complement accordingly (this actually works OK most of the time). I ended up writing a python package for our demultiplexing pipeline that tries to determine the proper orientation of the barcodes in each lane by looking at what's in the BCL files produced by the sequencer.

Update: Please see the comment below from John Marshall for an update on the NovaSeq!

2
Entering edit mode

As per this Illumina document, for NovaSeq this has changed with the v1.5 chemistry: for the old v1.0 chemistry, the original sequence is correct; but for NovaSeq with the new v1.5 reagent kits (which came out in 2020, long after Devon's answer) we're back to reverse complementing index2.

0
Entering edit mode

In case anyone else has this issue, I created a small script to fix the samplesheet.csv https://pypi.org/project/index2rc/

0
Entering edit mode

Hi @Devon, I hope you will see my comment and help me to solve my issue. If you saw my latest question on biostars you are probably noticed that I got successful in assigning reads to each sample.fastq - my question is now that my code gives results but I get very low reads for each fastq file - about 22kb for actual samples--and about 1.7gb for the undetermined fastq file. Would you help me to figure out what step can be more critical to take care of? With or without the --use-base-mask (if do not provide it automatically uses information from RunInfo.xml file) I get the same result. And I also provided the RC of the second index in the SampleSheet.csv.