I have a slightly unusual sequencing set-up (from a Novaseq paired-end sequencing experiment). I have i7 barcodes which are the sample indices. My i5 index is an 9bp UMI of random N's. My goal is to demultiplex based on the i7 sample indices, and then extract a separate file that is i5 UMI fastq. I want to use umi-dedup or something similar to then annotate R1/R2 headers with the UMI.
I can extract the R1, R2, and I5 index files to a fastq using the following command and RunInfo.xml file
bcl2fastq --output-dir BCL --sample-sheet SampleSheet.csv --create-fastq-for-index-reads <Read Number="1" NumCycles="150" IsIndexedRead="N"/> <Read Number="2" NumCycles="8" IsIndexedRead="Y"/> <Read Number="3" NumCycles="8" IsIndexedRead="N"/> <Read Number="4" NumCycles="150" IsIndexedRead="N"/>
but this gives me an index file with only N's
@A00454:609:H7VCGDRXY:1:2101:3821:1000 2:N:0:NTTACTCG NNNNNNNN + ########
Is there a way to somehow convert the N's to the actual UMI so that I can use it in downstream analysis? Any help would be appreciated since I'm very new to using UMIs in this way.