Forum:Troubleshooting Tips - bcl2fastq creates duplicate reads
1
0
Entering edit mode
6 days ago

Hi,

I have seen a few times where bcl2fastq (v2.20) will produce duplicate FASTQ entries in sequencing read IDs, raw sequences, & quality scores. This causes issues with downstreams tools like Picard MarkDuplicates (e.g. Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once).

I'm hoping to get some community input for how to troubleshooot this. For me, I did not get this issue when I removed the --no-lane-splitting option from my command and I checked the FASTQs to verify that one of the two identical reads had been removed (made sure to check across all L00X reads).

So far, I've also heard verifying the latest bcl2fastq version and re-running bcl2fastq as options (seqanswers link) - does anyone have any more methods or want to share their experience? Thanks in advance

bcl2fastq duplicates readgroup demultiplex • 270 views
1
Entering edit mode
6 days ago
GenoMax 106k

Try balancing number of threads you are using for reading/writing the data. My anecdotal hunch is this happens if you don't have a performant enough storage system.

There is also the option of using bcl-convert which is essentially replacing bcl2fastq.

0
Entering edit mode

Interesting, thank you! You are referring to the --loading-threads, --processing-threads, & --writing-threads options?

And do you mind going into more detail about what you mean by balancing?

1
Entering edit mode

Correct. What numbers are you currently using? Do you have access to a high performance storage system or are you writing to normal SSD/spinning disks? Try using more --writing-threads if you don't have a high performance storage system. Keep the overall number of threads low (adding all three).

0
Entering edit mode

--loading-threads 12

--processing-threads 24

--writing-threads: Not Set, but I think this means it is default to 4 according to the bcl2fastq Processing Options

We have high performance storage and computing cluster. I will play around with these options and try to keep their numbers low.

1
Entering edit mode

You would definitely want writing threads to be larger than the other two by 2 to 4.

0
Entering edit mode

Thank you - this worked! Why did you recommend increasing the number of write threads?

0
Entering edit mode

To provide adequate writing/processing buffer to make sure changes get written to disk.

0
Entering edit mode

Thank you - I will, but would like to keep it open for another few days, just in case others have other suggestions.