Question

bcl2fastq creates only Undetermined fastq

0

Entering edit mode

5.4 years ago

e.rempel ★ 1.1k

Hi everyone,

I have a question regarding the usage of bcl2fastq for an Agilent panel. The manufacture suggests in this guide to use the parameter "---use-bases-mask" with the specification Y*,I8,Y10,Y*.

Bcl2fastq throws in this case the error:

            UseBasesMask formatting error. A mask must be specified for each read. Number of reads: 3.

Indeed, according to RunInfo.xml there are 3 reads (with lengths 151, 8 (barcode), and 151 accordingly). If we specify three lengths in bcl2fastq call, e.g

            …  ---use-bases-mask Y*,I8,Y*

Then there is no error message and the program creates fastq files. Unfortunately they all are “Undetermined” (we obtain the same outputs if we don’t specify the –use-bases-mask parameter at all). That means imho that the bcl2fastq had problems with the extracting the barcode indices from the SampleSheet.csv data. We have edited the SampleSheet.csv according to the suggestion from the abovementioned guide: “ … [clear] the content in the “I5_index_ID” and “index2” columns”.

Any suggestions?

software error sequencing • 6.3k views

ADD COMMENT • link updated 5.4 years ago by GenoMax 145k • written 5.4 years ago by e.rempel ★ 1.1k

0

Entering edit mode

Hi e.rempel, You can't demultiplex data you do not have. If you did not sequence a 10bp molecular barcode the Y10 could not work.

Maybe an example from your SampleSheet.tsv would help understanding your problem. I don't understand what you want to say by "ALL fastq files Undetermined". Usually you should get 2 Undetermined....fastq.gz in /Data/Intensities/BaseCalls. One for the forward and one for the reverse reads where all your reads with IDs not found in your SampleSheet.tsv go. The fastqs you want, should be at /Data/Intensities/Basecalls/<sample_project>/<sample_id>. If you have problems with demultiplexing, having a look at /Data/Intensities/Basecalls/index.html usually helps for debugging. In /Data/Intensities/BaseCalls/Stats/DemuxSummary...txt (after "Most Popular Unknown Index Sequences") you can see even more indexes found in your data, which could not be mapped to a sample.

ADD REPLY • link 5.4 years ago by crisime ▴ 290

score 0 · Answer 1 · 2019-04-05

I think an error was made in the way these samples were run.

If these are indeed HaloPlex libraries then you should have run them as 2-D indexes. HaloPlex method needs index 2 recovered as a separate file. NNNNNNN shown in place of index 2 in SampleSheet.csv just means that the sequence there is variable (but it has to be sequenced as index 2, which seems to be completely missing from this run).

Demultiplexing is done using index 1 alone and a separate index file for index 2 is created as a part of bcl2fastq demultiplexing. You end up with an odd looking set of files: R1 --> Read 1, R2 --> Index 2, R3 --> Read 2 files.

If these are NOT HaloPlex samples then I would start using the code I have in this answer (C: Demultiplexing reads with index present in the labels ) to see what sequencer sequenced as the index 1 (as opposed to what you provided in the SampleSheet.csv) and we can go from there.