Question: Can Illumina bcl2fastq use only one index for demultiplexing dual index sequencing data?
0
gravatar for chen
17 months ago by
chen1.8k
OpenGene
chen1.8k wrote:

Hi,

For Illumina sequencing data with dual indexes (151 read1 + 8 index1 + 8 index2 + 151 read2), conventional demultiplexing method is to set both index1 and index2 for each sample.

However, for some data (i.e. UMI in index2), only index1 is fixed, and index2 is random. So there is no way to set both index1 and index2 in the sample sheet.

For such case, is it applicable to set only index1 to demultiplex data? Seems bcl2fastq doesn't support such settings. Does any have any experience?

index bcl2fastq demultiplexing • 2.9k views
ADD COMMENTlink modified 14 months ago by Gabriel R.2.6k • written 17 months ago by chen1.8k
2
gravatar for genomax
17 months ago by
genomax65k
United States
genomax65k wrote:

bcl2fastq handles UMI's that are part of Read 1/2. I am not sure how you are getting them in index 2.

A couple of possibilities come to mind.

  1. You could set a use-bases mask such as --use-bases-mask Y*,I8,n*,Y*. This would demux the data based on index 1 but still retain the sequence of index 2 in read headers. You can then parse the index sequences in the header and create a new SampleSheet.csv to re-demux original data or use something else to do a second round of demux with data from round 1.

  2. You could leave the data non-demultiplexed creating separate files for index reads. Then demux the data afterwards using reads 2 and 3.

Will random indexes be shared by more than one index 1's?

ADD COMMENTlink modified 17 months ago • written 17 months ago by genomax65k
1

Yes, different samples with different index 1 can have same random index 2.

Currently I demultiplex all data to Undetermined, and split the FASTQ file by its index 1. But it's time consuming.

I may try to alter bcl2fastq source code to support index 1 based demultiplexing for dual index data.

ADD REPLYlink modified 17 months ago • written 17 months ago by chen1.8k

How many random indexes are expected in index 2 generally (tens, hundreads or more)? Doing #1 in my comment above may be faster, if the index 2 size is manageable.

ADD REPLYlink written 17 months ago by genomax65k

thousands or even more

ADD REPLYlink written 17 months ago by chen1.8k

I think doing #1 is probably going to be the fastest option. One can easily collect index combinations from the resulting files from round 1 of demultiplexing. Since you work with NovaSeq the data files must be huge.

ADD REPLYlink written 17 months ago by genomax65k

Biologically speaking, how are you even getting the UMI in index read 2?

ADD REPLYlink written 17 months ago by Devon Ryan89k
1

Maybe it is something like this:

Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

ADD REPLYlink written 17 months ago by h.mon24k

Yes, with customized primers

ADD REPLYlink written 17 months ago by chen1.8k

Ah, that'll definitely break Illumina's software.

ADD REPLYlink written 17 months ago by Devon Ryan89k
1
gravatar for Gabriel R.
14 months ago by
Gabriel R.2.6k
Center for Geogenetik Københavns Universitet
Gabriel R.2.6k wrote:

You could simply use deML: https://grenaud.github.io/deML/

It is a maximum-likelihood demultiplexer algorithm that is designed to deal with incomplete or noisy data.

Hope this helps.

ADD COMMENTlink written 14 months ago by Gabriel R.2.6k

This is not noisy data but an unusual modification where the UMI is in the second index read.

ADD REPLYlink written 14 months ago by genomax65k

it was a general statement rather than a comment about the nature of OP's data :-) just do not demultiplex with the second index and simply use the first one. That will give you the demultiplexing using only the information provided by the first index.

ADD REPLYlink written 14 months ago by Gabriel R.2.6k
0
gravatar for h.mon
17 months ago by
h.mon24k
Brazil
h.mon24k wrote:

I am not sure this will work, but you can try bcl2fastq with the parameters --create-fastq-for-index-reads and --use-bases-mask Y151,I8,n8,Y151.

Worst case you will have to --create-fastq-for-index-reads and --use-bases-mask Y151,I8,I8,Y151, then join all reads from same index1 and use index2 as UMI.

ADD COMMENTlink written 17 months ago by h.mon24k
0
gravatar for petervangalen
14 months ago by
United States
petervangalen30 wrote:

You can specify which reads should be used for demultiplexing in RunInfo.xml, which may be more convenient than --use-bases-mask. I had a run with i7 (first index, Read#2) and i5 (second index, Read#3) but I only wanted to use i7 for demultiplexing.

  1. Make a backup copy of RunInfo.xml, which is in the run folder with the SampleSheet etc.

  2. Open RunInfo.xml and change the following:

<Read Number="3" NumCycles="8" IsIndexedRead="Y"/>

to

<Read Number="3" NumCycles="8" IsIndexedRead="N"/>

  1. Update SampleSheet.csv so it has only one barcode column

  2. Run bcl2fastq as you normally would

  3. The output was demultiplexed by i7 (first index, Read#2) and contained fastq files for three reads:

…_R1_…fastq.gz for Read#1

…_R2_…fastq.gz for i5 (second index, Read#3) that I didn't want to use for indexing

…_R3_…fastq.gz for Read#4 (it was a paired-end run)

ADD COMMENTlink modified 14 months ago • written 14 months ago by petervangalen30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1802 users visited in the last hour