Hi all,
I have data from a single-end run with a dual index structure generated by the NextSeq 500 instrument. I want to do the demultiplexing by bcl2fastq tool. How should my SampleSheet.csv structure and bcl2fastq script look like?
I used aSampleSheet structure (shared below) with a bcl2fastq command (mentioned below) and I got only 18% of reads undetermined which I think is a good ratio (total number of reads was around 76 million and ~14 million reads were undetermined).
The SampleSheet.csv structure. The index column is for index i7 and the index2 column is for index i5.
[Header],,,
[Reads],,,
[Settings],,,
adapter,,,
,,,
[Data],,,
Sample_ID,Sample_Name,Description,index,index2
1_mESCs,1_mESCs,,AACCGCGG,CTAGCGCT
2_mESCs,2_mESCs,,GGTTATAA,CTAGCGCT
The bcl2fastq script is as below.
bcl2fastq --runfolder-dir --output-dir --sample-sheet  --barcode-mismatches 0
Is the overall approach correct? Should I include the --use-bases-mask option as well?
Thank you for your input.
Thank you for your input. How can I check if length of the index cycles matches length of my indexes. I am not sure about it and do not want to encounter any mistake.
Is it not acceptable to use the bcl2fastq with the default options and let the tool identify the indices based on the samplesheet?
You must know that this flowcelll was run as say 50x8x8x50. Length of the index cycles here 8 matches your indexes in the samplesheet. You can see the number of cycles in RunInfo.xml file (in the data folder you have).
That is how bcl2fastq works. You seem to have changed that one option to get perfect matches on index reads otherwise all else is default.
Thank you for your input it was helpful.
Dear GenoMax,
This is the header of my
`RunInfo.xmlfile:The length of Indexes is 8 in the experiment and based on what you mentioned, I think I do not need to use the
bases-maskoption right?.Correct. This is a single-end dual indexed (8 bp each) run.
Thank you.