Demultiplexing reads from two libraries (single and dual indices) that were sequenced in the same sequencing run
2
0
Entering edit mode
6.8 years ago
Lina F ▴ 200

Hi all,

A colleague in the lab asked me to demultiplex a recent NextSeq run. She loaded it with samples prepared from two libraries. One library had single indices and one had dual indices. She also prepared two sample sheets for me to use.

I first ran bcl2fastq as follows with the sample sheet for the single-index samples:

bcl2fastq --no-lane-splitting -R $INPUT_DIR -o $OUTPUT_DIR --sample-sheet $SAMPLE_SHEET

This resulted in paired fastq files (R1 and R2) and two large Undetermined files (I assume these contain the sequences that belong to the dual-index experiment).

I then ran bcl2fastq the same way but with the sample sheet for the dual index samples. However, this time there were no separate fastq files for the different samples, and all reads ended up in the Undetermined files.

My questions are as follows:

  1. Is running bcl2fastq twice the best approach to demultiplex this run? Is there a way to combine the sample sheets?
  2. I believe the second bcl2fastq run should have worked. Or is there a different way to indicate dual-index samples to bcl2fastq? I didn't get any error messages, but maybe the sample sheet was malformed since all the reads ended up in the Undetermined file.
  3. I took a look at the headers in the Undetermined file and noticed that the barcodes in the headers are almost the same as the forward indices in the sample sheet. Is it easiest to just manually pull these sequences apart?

Thanks for any advice!

demultiplexing nextseq bcl2fastq • 4.9k views
ADD COMMENT
1
Entering edit mode
6.8 years ago
aswathyseb ▴ 30

yes, you have to run it twice.

To the best of my understanding based on the version that I am using , you may have to specify cycle information in the command line as -- use-base-mask Y150,I6*,I6*,Y150

See the manual.

ADD COMMENT
0
Entering edit mode

Thank you for the feedback!

I tried your suggestion but it did not work. bcl2fastq complained about the asterisk symbol after the 6's.

I took a look at my RunInfo.xml file:

...
    <Reads>
      <Read Number="1" NumCycles="147" IsIndexedRead="N" />
      <Read Number="2" NumCycles="12" IsIndexedRead="Y" />
      <Read Number="3" NumCycles="12" IsIndexedRead="Y" />
      <Read Number="4" NumCycles="147" IsIndexedRead="N" />
    </Reads>
    <FlowcellLayout LaneCount="4" SurfaceCount="2" SwathCount="3" TileCount="12" SectionPerLane="3" LanePerSection="2">
      <TileSet TileNamingConvention="FiveDigit">
        <Tiles>
          <Tile>1_11101</Tile>
          <Tile>1_21101</Tile>
...

Based on this information, and the fact that the indices my colleague used are 8 bases long, I adjusted the command line option you suggested as follows:

--use-bases-mask Y147,I8nnnn,I8nnnn,Y147

Unfortunately, this still put all the reads into the Undetermined output files.

I'd appreciate any further information you might have!

ADD REPLY
0
Entering edit mode

Could you post your sample sheet? Seeing it might help to answer. Thanks,

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hi Lina,

I haven't used the latest version of bcl2fastq software (2.19), so haven't tested it out. But from the manual, the data section of the sample sheet has the following columns - Lane,Sample_ID,Sample_Name,Sample_Project,Index,Index2.

In your sample sheet these are not given in this order. My guess is that when it looks for indexes in columns 5 and 6 it can't find it from your sample sheet and hence everything gets placed in the Undertermined file. Try modifying the sample sheet as per the manual and rerunning it.

ADD REPLY
0
Entering edit mode

Thank you for taking a look!

I went back to the lab tech and asked her to take a critical look at the indices. She regenerated the file using the Illumina Experiment Manager and now it worked. I believe she messed up the file she gave me the first time around...

ADD REPLY
0
Entering edit mode
6.8 years ago
h.mon 35k

Assuming the single- and dual-indexed samples are on separate lanes, you may run bcl2fastq twice, but excluding lanes of the wrong index type with the argument --tiles. this way you avoid the large Undetermined files, and don't spend useless CPU power.

ADD COMMENT
0
Entering edit mode

Thank you for the feedback! I only have the sample sheet -- based on this, how can I determine which tiles to exclude?

ADD REPLY

Login before adding your answer.

Traffic: 3070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6