bcl2fastq troubleshooting all reads dumped to "Undetermined"
0
0
Entering edit mode
4 months ago
MaxF ▴ 120

Hi everyone,

Another lab ran a single-end sequencing run on a NextSeq for us, but now they can't properly demultiplex them. I'm trying to see if I can figure it out.

I run bcl2fastq (newest version) on the files, but all reads are dumped to Undetermined_S0_L001_R1_001.fastq.gz

I've got a SampleSheet.csv file that should contain the relevant barcodes, and it is found by bcl2fastq, but clearly isn't working out. In this sample sheet, there's a single field for "AdapterRead1" and then for each sample there's an "Index" and "Index2" field. What I'm trying to figure out is what my reads SHOULD look like. For instance, should they be in the format:

AdapterRead1 + Index(x) + actual sequence

When I manually grep through the samples, I find a significant minority of them do contain the "AdapterRead1" sequence (sometimes at the beginning of the read...sometimes not). Of those, there are some that are also in the format I describe, where one of the Indexes follows AdapterRead1....but most reads don't have AdapterRead1 at all.

When I look in the DemuxSummaryF1L1.txt file, I just see this at the bottom:

### Most Popular Unknown Index Sequences
### Columns: Index_Sequence Hit_Count
unknown 539084000

Since I'm not that familiar with what the sequences should look like, or how this software should behave, I am just not sure how/where to start troubleshooting.

demultiplex illumina sequencing • 1.0k views
ADD COMMENT
1
Entering edit mode

There should be Stats.json file found in Stat folder after demultiplexing. Multiqc parses the Stats.json file and displays top 20 or something indexes found in the undetermined.fastq files. Hopefully, that will give you some clue.

ADD REPLY
0
Entering edit mode

I'm not sure what I'm looking for...but it doesn't look good?

Run ID - Lane                           Mb Total Yield  M Total Clusters    % bases Q30 Mean Quality    % Perfect Index
231026_VH01116_49_AACK72GM5 - L1    0.0         0.0                 NA%         NA          NA%

All the other stats in the multiqc file just show that everything is undetermined

ADD REPLY
1
Entering edit mode

When I look in the DemuxSummaryF1L1.txt file, I just see this at the bottom:

### Most Popular Unknown Index Sequences
### Columns: Index_Sequence Hit_Count
unknown 539084000

That does not make sense. If the run was set up correctly then even if you had the wrong indexes listed in the SampleSheet, the indexes that sequencer sees should show up in this file.

Can you show us what the RunInfo.xml file contains for the index set up? An example below

      <Read NumCycles="251" Number="1" IsIndexedRead="N" />
      <Read NumCycles="8" Number="2" IsIndexedRead="Y" />
      <Read NumCycles="8" Number="3" IsIndexedRead="Y" />
      <Read NumCycles="251" Number="4" IsIndexedRead="N" />
ADD REPLY
0
Entering edit mode
        <Reads>
                <Read Number="1" NumCycles="118" IsIndexedRead="N" IsReverseComplement="N"/>
                <Read Number="2" NumCycles="10" IsIndexedRead="Y" IsReverseComplement="N"/>
                <Read Number="3" NumCycles="10" IsIndexedRead="Y" IsReverseComplement="Y"/>
        </Reads>
ADD REPLY
1
Entering edit mode

Since your run has two indexes you need to provide a samplesheet that contains both indexes. There is an example here.

ADD REPLY
0
Entering edit mode

I think I do have two?

Here's a piece of the sample sheet:

[BCLConvert_Settings]       
SoftwareVersion 3.8.4   
NoLaneSplitting TRUE    
AdapterRead1    CTGTCTCTTATACACATCT 
OverrideCycles  Y118;I10;I10    
FastqCompressionFormat  gzip    

[BCLConvert_Data]       
Sample_ID   Index           Index2
Ben1            ATACCAACGC  AATTGCTGCG
Ben2            AGGATGTGCT  TTACAATTCC
Ben3            CACGGAACAA  AACCTAGCAC
ADD REPLY
1
Entering edit mode

Can you show a read from Undetermined read file?

zmore Undetermined....fastq.gz | head -4
ADD REPLY
0
Entering edit mode
------> Undetermined_S0_L001_R1_001.fastq.gz <------
@VH01116:49:AACK72GM5:1:1101:18174:1133 1:N:0:0
GGCTGACCATAGGGCATGAGGGCGTGGGGAGTCNGGTGTGGGTTGGGGTGTGGTGGGGTGTTAGCTGGGGGGTGTTTGTGGGGGGGCGCAGGTGGGTGGGCTGGCTTGTTCTCAGGCA
+
CCCCCCCCCCCCCCCC;CCC;CCCCCC;CCC;C#CCCC;CCCC;CCCCCCCCCCC;CCCC;;-C;CCCCCCC-CCCCCCCC;CC-C-C;;CC-C-C--;-;C-;C--;C-CC;CCC;-
ADD REPLY
0
Entering edit mode

Normally the index should show up in the fastq header (LINK). As you can see in your case it does not. Something is not making sense here. Your RunInfo.xml file is showing that the run was indeed set up with indexes but data here says otherwise.

I don't think the SampleSheet you show above is working. What version of bcl2fastq are you using? Have you looked at the log file for the bcl2fastq run?

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6