Forum:Troubleshooting Tips - bcl2fastq creates duplicate reads
1
0
Entering edit mode
6 days ago

Hi,

I have seen a few times where bcl2fastq (v2.20) will produce duplicate FASTQ entries in sequencing read IDs, raw sequences, & quality scores. This causes issues with downstreams tools like Picard MarkDuplicates (e.g. Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once).

I'm hoping to get some community input for how to troubleshooot this. For me, I did not get this issue when I removed the --no-lane-splitting option from my command and I checked the FASTQs to verify that one of the two identical reads had been removed (made sure to check across all L00X reads).

So far, I've also heard verifying the latest bcl2fastq version and re-running bcl2fastq as options (seqanswers link) - does anyone have any more methods or want to share their experience? Thanks in advance

bcl2fastq duplicates readgroup demultiplex • 270 views
ADD COMMENT
1
Entering edit mode
6 days ago
GenoMax 106k

Try balancing number of threads you are using for reading/writing the data. My anecdotal hunch is this happens if you don't have a performant enough storage system.

There is also the option of using bcl-convert which is essentially replacing bcl2fastq.

ADD COMMENT
0
Entering edit mode

Interesting, thank you! You are referring to the --loading-threads, --processing-threads, & --writing-threads options?

And do you mind going into more detail about what you mean by balancing?

ADD REPLY
1
Entering edit mode

Correct. What numbers are you currently using? Do you have access to a high performance storage system or are you writing to normal SSD/spinning disks? Try using more --writing-threads if you don't have a high performance storage system. Keep the overall number of threads low (adding all three).

ADD REPLY
0
Entering edit mode

--loading-threads 12

--processing-threads 24

--writing-threads: Not Set, but I think this means it is default to 4 according to the bcl2fastq Processing Options

We have high performance storage and computing cluster. I will play around with these options and try to keep their numbers low.

ADD REPLY
1
Entering edit mode

You would definitely want writing threads to be larger than the other two by 2 to 4.

ADD REPLY
0
Entering edit mode

Thank you - this worked! Why did you recommend increasing the number of write threads?

ADD REPLY
0
Entering edit mode

To provide adequate writing/processing buffer to make sure changes get written to disk.

If this answered your question then consider marking original answer accepted (green check mark for answer) to provide closure to this thread.

ADD REPLY
0
Entering edit mode

Thank you - I will, but would like to keep it open for another few days, just in case others have other suggestions.

ADD REPLY

Login before adding your answer.

Traffic: 2295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6