I have data from a single-end run with a dual index structure generated by the NextSeq 500 instrument. I want to do the demultiplexing by
bcl2fastq tool. How should my
SampleSheet.csv structure and
bcl2fastq script look like?
I used a
SampleSheet structure (shared below) with a
bcl2fastq command (mentioned below) and I got only 18% of reads
undetermined which I think is a good ratio (total number of reads was around 76 million and ~14 million reads were undetermined).
SampleSheet.csv structure. The
index column is for index i7 and the
index2 column is for index i5.
[Header],,, [Reads],,, [Settings],,, adapter,,, ,,, [Data],,, Sample_ID,Sample_Name,Description,index,index2 1_mESCs,1_mESCs,,AACCGCGG,CTAGCGCT 2_mESCs,2_mESCs,,GGTTATAA,CTAGCGCT
bcl2fastq script is as below.
bcl2fastq --runfolder-dir --output-dir --sample-sheet --barcode-mismatches 0
Is the overall approach correct? Should I include the
--use-bases-mask option as well?
Thank you for your input.
Thank you for your input. How can I check if length of the index cycles matches length of my indexes. I am not sure about it and do not want to encounter any mistake.
Is it not acceptable to use the bcl2fastq with the default options and let the tool identify the indices based on the samplesheet?
You must know that this flowcelll was run as say 50x8x8x50. Length of the index cycles here 8 matches your indexes in the samplesheet. You can see the number of cycles in RunInfo.xml file (in the data folder you have).
That is how bcl2fastq works. You seem to have changed that one option to get perfect matches on index reads otherwise all else is default.
Thank you for your input it was helpful.
This is the header of my
The length of Indexes is 8 in the experiment and based on what you mentioned, I think I do not need to use the
Correct. This is a single-end dual indexed (8 bp each) run.