Question

Demultiplexing reads from three libraries (non, single and dual indices)

0

Entering edit mode

2.8 years ago

almogangel • 0

Hi everyone,

First of all, I am aware that this topic has been discussed before, however, I could not find a suitable answer for my needs.

We used NovaSeq 6000 to sequence multiple samples with different library design:

The first has dual indices that are 8bps long.
The second has one index that is also 8bps long (index 2 - i5)
The third has no indices, only a barcode sequence in the read itself

The reads structure in the RunInfo.xml file looks like this:

<Reads>
<Read Number="1" NumCycles="55" IsIndexedRead="N"/>
<Read Number="2" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="4" NumCycles="35" IsIndexedRead="N"/>
</Reads>

I think that the only possible way is to create two sample sheets, the first for the dual indices:

[Header],,,,,,,,
<some header stuff>
,,,,,,,,
[Reads],,,,,,,,
Read1Cycles,55,,,,,,,
Read2Cycles,35,,,,,,,
Index1Cycles,8,,,,,,,
Index2Cycles,8,,,,,,,
,,,,,,,,
[Settings],,,,,,,,
,,,,,,,,
[Data],,,,,,,,
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,index2,Sample_Project,Description
sample1_with_dual_indices,,,,,CTCTCTAC,ATAGAGAG,blabla,blabla
sample2_with_dual_indices,,,,,CGAGGCTG,AGAGGATA,blabla,blabla
sample3_with_dual_indices,,,,,CGAGGCTG,CTCCTTAC,blabla,blabla
sample_n_with_dual_indices,,,,,CGAGGCTG,TATGCAGT,blabla,blabla

and using the "Undetermined" files here for the samples with no indices.

Finally, making another SampleSheet for the single index sample, same as above but using only the index2 column.

Regarding bcl2fastq arguments, should it be:

--use-bases-mask Y*,I8n*,I8n,Y*

for the dual indices SampleSheet run, and

--use-bases-mask Y*,I8n*,Y*

for the single index run.

I don't have much confidence in this strategy, but this is the best I came up with. Please let me know what do you think.

Thanks!

demultiplexing bcl2fastq Illumina • 872 views

ADD COMMENT • link updated 2.8 years ago by GenoMax 141k • written 2.8 years ago by almogangel • 0

score 1 · Accepted Answer · 2021-06-21

1

Entering edit mode

2.8 years ago

GenoMax 141k

We used NovaSeq 6000 to sequence multiple samples with different library design

Are all samples in the same pool that was sequenced across one or more lanes?

The first has dual indices that are 8bps long

This variation should be a straight forward demux.

The second has one index that is also 8bps long (index 2 - i5)

If this means that there is no i7 index then you will likely need to do --use-bases-mask Y*,n*,I8,Y* so you can demux just based on i5 index.

The third has no indices, only a barcode sequence in the read itself

This would basically be reads that end up in "Undetermined" pool after two above variations that you will need to deal with outside of bcl2fastq. If there was phiX spiked into the run it would also be in this pool. Is there a defined location of where you expect the internal barcode to be?

ADD COMMENT • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

Are all samples in the same pool that was sequenced across one or more lanes?

All samples are part of the same pool

Is there a defined location of where you expect the internal barcode to be?

Yes, the internal barcode is the first 7bp of R1

ADD REPLY • link 2.8 years ago by almogangel • 0

0

Entering edit mode

Yes, the internal barcode is the first 7bp of R1

Answer here Tools for demultiplexing a large fastq file based on random in-line barcodes should help you identify the indexes (if they are variable) and then demux that set.

ADD REPLY • link 2.8 years ago by GenoMax 141k