I need your help to address the parameter found in bcl2fastq2 tool when demultiplexing data generated by Illumina's sequencers. As you know, there are different ways to sequence genomic data but mostly by doing Paired-End (PE) or Single-End (SE) sequencing. Plus, to sequence the data, you have to use single-indexing or double (or dual) indexing on the reads. As per Illumina's definition:
Single and Dual Indexing
The number of index sequences added to samples differs for single-indexed and dual-indexed sequencing.
Single-indexed libraries — Adds up to 48 unique six-base Index 1 (i7) sequences to generate up to 48 uniquely tagged libraries.
Dual-indexed libraries — Adds up to 24 unique eight-base Index 1 (i7) sequences and up to 16 unique eight-base Index 2 (i5) sequences, generating up to 384 uniquely tagged libraries. The IDT for Illumina TruSeq UD Indexes are provided as index pairs and can generate up to 96 uniquely tagged libraries. These indexes add up to 96 unique eight-base Index 1 sequences and up to 96 unique eight-base Index 2 indexes.
During indexed sequencing, the index is sequenced in a separate read, called the Index Read, where a new sequencing primer is annealed. When libraries are dual-indexed, the sequencing run includes two additional reads, called the Index 1 Read and Index 2 Read.
Knowing this, I have two questions:
- Is it acceptable to mix single index and dual index on the same flowcell (e.g. Hiseq 4000) knowing that we configured the sequencer as a dual index run ?
- How can we demultiplex such data since the file generated by the sequencer (RunInfo.xml) contains configuration for a dual index run ? In other words, demultiplexing lanes that have dual index works fine when providing the RunInfo.xml, but for single index, what should I use for the --use-bases-mask parameter ?
Also, I know that for --use-bases-mask, we can use the following parameters for different types of sequencing:
- Single-End sequencing:
Y * ,I6N *
- No Index:
Y\*,Y\*(Thanks to Devon Ryan)
- Single Indexing:
Y\*,I6N,Y\*(Thanks to Devon Ryan)
- In-read barcode in the first read for some of the samples, but the run was PE dual-index:
I5Y*,N*,N*,Y*(Thanks to igor)
- 10x Genomic Single Cell 3' v1 kit:
Y98,Y14,I8,Y10(Thanks to igor)
- 10x Genomic Single Cell 3' v1 kit + more standard libraries on the same run:
Y98N*,Y14N*,I8N*,Y10N*(Thanks to igor)
Also, could you please state what other types of parameters could be used in different cases ? (for future readers)
Thanks for your time and help. Don't forget to upvote this post please so users can find this post.