BCL2Fastq when I5 index is UMI
1
0
Entering edit mode
3 months ago

I have a slightly unusual sequencing set-up (from a Novaseq paired-end sequencing experiment). I have i7 barcodes which are the sample indices. My i5 index is an 9bp UMI of random N's. My goal is to demultiplex based on the i7 sample indices, and then extract a separate file that is i5 UMI fastq. I want to use umi-dedup or something similar to then annotate R1/R2 headers with the UMI.

I can extract the R1, R2, and I5 index files to a fastq using the following command and RunInfo.xml file

bcl2fastq --output-dir BCL --sample-sheet SampleSheet.csv --create-fastq-for-index-reads



but this gives me an index file with only N's

@A00454:609:H7VCGDRXY:1:2101:3821:1000 2:N:0:NTTACTCG
NNNNNNNN
+
########


Is there a way to somehow convert the N's to the actual UMI so that I can use it in downstream analysis? Any help would be appreciated since I'm very new to using UMIs in this way.

UMI bcl2fastq demultiplex • 353 views
2
Entering edit mode
3 months ago
GenoMax 107k

You need to add the option --mask-short-adapter-reads 0 when you demultiplex the data with bcl2fastq. This will restore the sequence in your UMI file. Normally if a read falls below 22 bp it is masked by default, which is what you are observing.

0
Entering edit mode

works perfectly, thank you