I'm trying to generate demultiplxed fastq files from my HiSeq4000 run.
I ran 3 paired-end samples in two lanes, indexed by hexamer sequences on both reads. In each lane I spiked PhiX sequences to enrich the diversity.
Specifically, my fragments look like this:
The transcript parts of the fragments are near the 3' end so my reads are expected to look like this:
read1 - ran for 110 cycles: [6bp-index]-[104bp transcript] read2 ran for 55 cycles: [6bp-index]-[46bp barcodes]-[3-bp polyA]
The Runinfo.xml file in the run folder says each read is 150 bp, the left index is 14 bp, and the right one is 8 bp:
Read Number="1" NumCycles="150" IsIndexedReads="N" Read Number="2" NumCycles="14" IsIndexedReads="Y" Read Number="3" NumCycles="8" IsIndexedReads="Y" Read Number="4" NumCycles="150" IsIndexedReads="N"
I tried several combinations of SampleSheet and
--use-bases-mask argument for the bcl2fastq parameters, such as:
Under [Reads] in the SampleSheet only defining read lengths of 150 seems to work:
And under Data header in the SampleSheet file I define: Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,index2,Sample_Project,Description
1,Lib1,,,,AR006,GCCAAT,GCCAAT,, 2,Lib1,,,,AR006,GCCAAT,GCCAAT,, 1,Lib2,,,,AR008,ACTTGA,ACTTGA,, 2,Lib2,,,,AR008,ACTTGA,ACTTGA,, 1,Lib3,,,,AR012,CTTGTA,CTTGTA,, 2,Lib3,,,,AR012,CTTGTA,CTTGTA,,
But the only fastq files I'm getting are of the underdetermined reads. So my questions is whether this is real and I didn't get any of my expected reads and I basically only sequenced PhiX or am I incorrectly specifying the SampleSheet and --use-bases-mask parameters.
Thanks a lot