Question

BCL2Fastq when I5 index is UMI

1

Entering edit mode

2.8 years ago

ccagg ▴ 60

I have a slightly unusual sequencing set-up (from a Novaseq paired-end sequencing experiment). I have i7 barcodes which are the sample indices. My i5 index is an 9bp UMI of random N's. My goal is to demultiplex based on the i7 sample indices, and then extract a separate file that is i5 UMI fastq. I want to use umi-dedup or something similar to then annotate R1/R2 headers with the UMI.

I can extract the R1, R2, and I5 index files to a fastq using the following command and RunInfo.xml file

bcl2fastq --output-dir BCL --sample-sheet SampleSheet.csv --create-fastq-for-index-reads

<Read Number="1" NumCycles="150" IsIndexedRead="N"/>
<Read Number="2" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="8" IsIndexedRead="N"/>
<Read Number="4" NumCycles="150" IsIndexedRead="N"/>

but this gives me an index file with only N's

@A00454:609:H7VCGDRXY:1:2101:3821:1000 2:N:0:NTTACTCG
NNNNNNNN
+
########

Is there a way to somehow convert the N's to the actual UMI so that I can use it in downstream analysis? Any help would be appreciated since I'm very new to using UMIs in this way.

UMI bcl2fastq demultiplex • 2.2k views

ADD COMMENT • link 2.8 years ago by ccagg ▴ 60

score 3 · Accepted Answer · 2021-07-09

3

Entering edit mode

2.8 years ago

GenoMax 141k

You need to add the option --mask-short-adapter-reads 0 when you demultiplex the data with bcl2fastq. This will restore the sequence in your UMI file. Normally if a read falls below 22 bp it is masked by default, which is what you are observing.