Question: Demultiplexing reads from two libraries (single and dual indices) that were sequenced in the same sequencing run
0
gravatar for Lina F
21 months ago by
Lina F150
Boston, MA
Lina F150 wrote:

Hi all,

A colleague in the lab asked me to demultiplex a recent NextSeq run. She loaded it with samples prepared from two libraries. One library had single indices and one had dual indices. She also prepared two sample sheets for me to use.

I first ran bcl2fastq as follows with the sample sheet for the single-index samples:

bcl2fastq --no-lane-splitting -R $INPUT_DIR -o $OUTPUT_DIR --sample-sheet $SAMPLE_SHEET

This resulted in paired fastq files (R1 and R2) and two large Undetermined files (I assume these contain the sequences that belong to the dual-index experiment).

I then ran bcl2fastq the same way but with the sample sheet for the dual index samples. However, this time there were no separate fastq files for the different samples, and all reads ended up in the Undetermined files.

My questions are as follows:

  1. Is running bcl2fastq twice the best approach to demultiplex this run? Is there a way to combine the sample sheets?
  2. I believe the second bcl2fastq run should have worked. Or is there a different way to indicate dual-index samples to bcl2fastq? I didn't get any error messages, but maybe the sample sheet was malformed since all the reads ended up in the Undetermined file.
  3. I took a look at the headers in the Undetermined file and noticed that the barcodes in the headers are almost the same as the forward indices in the sample sheet. Is it easiest to just manually pull these sequences apart?

Thanks for any advice!

ADD COMMENTlink modified 21 months ago by h.mon24k • written 21 months ago by Lina F150
1
gravatar for aswathyseb
21 months ago by
aswathyseb30
aswathyseb30 wrote:

yes, you have to run it twice.

To the best of my understanding based on the version that I am using , you may have to specify cycle information in the command line as -- use-base-mask Y150,I6*,I6*,Y150

See the manual.

ADD COMMENTlink written 21 months ago by aswathyseb30

Thank you for the feedback!

I tried your suggestion but it did not work. bcl2fastq complained about the asterisk symbol after the 6's.

I took a look at my RunInfo.xml file:

...
    <Reads>
      <Read Number="1" NumCycles="147" IsIndexedRead="N" />
      <Read Number="2" NumCycles="12" IsIndexedRead="Y" />
      <Read Number="3" NumCycles="12" IsIndexedRead="Y" />
      <Read Number="4" NumCycles="147" IsIndexedRead="N" />
    </Reads>
    <FlowcellLayout LaneCount="4" SurfaceCount="2" SwathCount="3" TileCount="12" SectionPerLane="3" LanePerSection="2">
      <TileSet TileNamingConvention="FiveDigit">
        <Tiles>
          <Tile>1_11101</Tile>
          <Tile>1_21101</Tile>
...

Based on this information, and the fact that the indices my colleague used are 8 bases long, I adjusted the command line option you suggested as follows:

--use-bases-mask Y147,I8nnnn,I8nnnn,Y147

Unfortunately, this still put all the reads into the Undetermined output files.

I'd appreciate any further information you might have!

ADD REPLYlink written 21 months ago by Lina F150

Could you post your sample sheet? Seeing it might help to answer. Thanks,

ADD REPLYlink written 21 months ago by aswathyseb30

I put in on Dropbox here: https://www.dropbox.com/s/x2kiuy70ht39u3z/NEBvsKAPA-NEB.csv?dl=0

Thank you!

ADD REPLYlink written 21 months ago by Lina F150

Hi Lina,

I haven't used the latest version of bcl2fastq software (2.19), so haven't tested it out. But from the manual, the data section of the sample sheet has the following columns - Lane,Sample_ID,Sample_Name,Sample_Project,Index,Index2.

In your sample sheet these are not given in this order. My guess is that when it looks for indexes in columns 5 and 6 it can't find it from your sample sheet and hence everything gets placed in the Undertermined file. Try modifying the sample sheet as per the manual and rerunning it.

ADD REPLYlink written 21 months ago by aswathyseb30

Thank you for taking a look!

I went back to the lab tech and asked her to take a critical look at the indices. She regenerated the file using the Illumina Experiment Manager and now it worked. I believe she messed up the file she gave me the first time around...

ADD REPLYlink written 21 months ago by Lina F150
0
gravatar for h.mon
21 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Assuming the single- and dual-indexed samples are on separate lanes, you may run bcl2fastq twice, but excluding lanes of the wrong index type with the argument --tiles. this way you avoid the large Undetermined files, and don't spend useless CPU power.

ADD COMMENTlink modified 21 months ago • written 21 months ago by h.mon24k

Thank you for the feedback! I only have the sample sheet -- based on this, how can I determine which tiles to exclude?

ADD REPLYlink written 21 months ago by Lina F150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1689 users visited in the last hour