Question

How to demultiplex single-end dual indexed run - NextSeq 500

0

Entering edit mode

13 months ago

Apex92 ▴ 280

Hi all,

I have data from a single-end run with a dual index structure generated by the NextSeq 500 instrument. I want to do the demultiplexing by bcl2fastq tool. How should my SampleSheet.csv structure and bcl2fastq script look like?

I used aSampleSheet structure (shared below) with a bcl2fastq command (mentioned below) and I got only 18% of reads undetermined which I think is a good ratio (total number of reads was around 76 million and ~14 million reads were undetermined).

The SampleSheet.csv structure. The index column is for index i7 and the index2 column is for index i5.

[Header],,,
[Reads],,,
[Settings],,,
adapter,,,
,,,
[Data],,,
Sample_ID,Sample_Name,Description,index,index2
1_mESCs,1_mESCs,,AACCGCGG,CTAGCGCT
2_mESCs,2_mESCs,,GGTTATAA,CTAGCGCT

The bcl2fastq script is as below.

bcl2fastq --runfolder-dir --output-dir --sample-sheet  --barcode-mismatches 0

Is the overall approach correct? Should I include the --use-bases-mask option as well?

Thank you for your input.

sequencing bcl2fastq rna-seq • 1.1k views

ADD COMMENT • link 13 months ago by Apex92 ▴ 280

score 3 · Accepted Answer · 2023-03-24

3

Entering edit mode

13 months ago

GenoMax 141k

Looks like you did this right.

Only thing you could try is to remove --barcode-mismatches 0 which would allow one error in each index. If your indexes do not allow this (which may be the case if they are not far enough apart in terms of hamming distance) then you already have the best possible result. There is no need to use bases-mask if length of the index cycles matches length of your indexes.

You will have to discard those "undetermined" pool reads.

ADD COMMENT • link 13 months ago by GenoMax 141k

0

Entering edit mode

Thank you for your input. How can I check if length of the index cycles matches length of my indexes. I am not sure about it and do not want to encounter any mistake.

Is it not acceptable to use the bcl2fastq with the default options and let the tool identify the indices based on the samplesheet?

ADD REPLY • link 13 months ago by Apex92 ▴ 280

0

Entering edit mode

You must know that this flowcelll was run as say 50x8x8x50. Length of the index cycles here 8 matches your indexes in the samplesheet. You can see the number of cycles in RunInfo.xml file (in the data folder you have).

Is it not acceptable to use the bcl2fastq with the default options and let the tool identify the indices based on the samplesheet?

That is how bcl2fastq works. You seem to have changed that one option to get perfect matches on index reads otherwise all else is default.

ADD REPLY • link 13 months ago by GenoMax 141k

0

Entering edit mode

Thank you for your input it was helpful.

ADD REPLY • link 13 months ago by Apex92 ▴ 280

0

Entering edit mode

Dear GenoMax,

This is the header of my `RunInfo.xml file:

<?xml version="1.0"?>
<RunInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Version="4">
  <Run Id="230323_NB501365_0745_AH5TY5BGXT" Number="745">
    <Flowcell>H5TY5BGXT</Flowcell>
    <Instrument>NB501365</Instrument>
    <Date>230323</Date>
    <Reads>
      <Read Number="1" NumCycles="66" IsIndexedRead="N" />
      <Read Number="2" NumCycles="8" IsIndexedRead="Y" />
      <Read Number="3" NumCycles="8" IsIndexedRead="Y" />
    </Reads>
    <FlowcellLayout LaneCount="4" SurfaceCount="2" SwathCount="3" TileCount="12" SectionPerLane="3" LanePerSection="2">
      <TileSet TileNamingConvention="FiveDigit">

The length of Indexes is 8 in the experiment and based on what you mentioned, I think I do not need to use the bases-mask option right?.

ADD REPLY • link 13 months ago by Apex92 ▴ 280

1

Entering edit mode

Correct. This is a single-end dual indexed (8 bp each) run.

ADD REPLY • link 13 months ago by GenoMax 141k

0

Entering edit mode

Thank you.

ADD REPLY • link 13 months ago by Apex92 ▴ 280