BCL2Fastq when I5 index is UMI
1
1
Entering edit mode
2.8 years ago
ccagg ▴ 60

I have a slightly unusual sequencing set-up (from a Novaseq paired-end sequencing experiment). I have i7 barcodes which are the sample indices. My i5 index is an 9bp UMI of random N's. My goal is to demultiplex based on the i7 sample indices, and then extract a separate file that is i5 UMI fastq. I want to use umi-dedup or something similar to then annotate R1/R2 headers with the UMI.

I can extract the R1, R2, and I5 index files to a fastq using the following command and RunInfo.xml file

bcl2fastq --output-dir BCL --sample-sheet SampleSheet.csv --create-fastq-for-index-reads

<Read Number="1" NumCycles="150" IsIndexedRead="N"/>
<Read Number="2" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="8" IsIndexedRead="N"/>
<Read Number="4" NumCycles="150" IsIndexedRead="N"/>

but this gives me an index file with only N's

@A00454:609:H7VCGDRXY:1:2101:3821:1000 2:N:0:NTTACTCG
NNNNNNNN
+
########

Is there a way to somehow convert the N's to the actual UMI so that I can use it in downstream analysis? Any help would be appreciated since I'm very new to using UMIs in this way.

UMI bcl2fastq demultiplex • 2.2k views
ADD COMMENT
3
Entering edit mode
2.8 years ago
GenoMax 141k

You need to add the option --mask-short-adapter-reads 0 when you demultiplex the data with bcl2fastq. This will restore the sequence in your UMI file. Normally if a read falls below 22 bp it is masked by default, which is what you are observing.

ADD COMMENT
0
Entering edit mode

works perfectly, thank you

ADD REPLY

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6