Question

Forum:Troubleshooting Tips - bcl2fastq creates duplicate reads

0

Entering edit mode

2.6 years ago

DavidStreid ▴ 90

Hi,

I have seen a few times where bcl2fastq (v2.20) will produce duplicate FASTQ entries in sequencing read IDs, raw sequences, & quality scores. This causes issues with downstreams tools like Picard MarkDuplicates (e.g. Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once).

I'm hoping to get some community input for how to troubleshooot this. For me, I did not get this issue when I removed the --no-lane-splitting option from my command and I checked the FASTQs to verify that one of the two identical reads had been removed (made sure to check across all L00X reads).

So far, I've also heard verifying the latest bcl2fastq version and re-running bcl2fastq as options (seqanswers link) - does anyone have any more methods or want to share their experience? Thanks in advance

bcl2fastq duplicates readgroup demultiplex • 1.8k views

ADD COMMENT • link 2.6 years ago by DavidStreid ▴ 90

score 2 · Accepted Answer · 2021-09-15

2

Entering edit mode

2.6 years ago

GenoMax 141k

Try balancing number of threads you are using for reading/writing the data. My anecdotal hunch is this happens if you don't have a performant enough storage system.

There is also the option of using bcl-convert which is essentially replacing bcl2fastq.

ADD COMMENT • link 2.6 years ago by GenoMax 141k

0

Entering edit mode

Interesting, thank you! You are referring to the --loading-threads, --processing-threads, & --writing-threads options?

And do you mind going into more detail about what you mean by balancing?

ADD REPLY • link 2.6 years ago by DavidStreid ▴ 90

1

Entering edit mode

Correct. What numbers are you currently using? Do you have access to a high performance storage system or are you writing to normal SSD/spinning disks? Try using more --writing-threads if you don't have a high performance storage system. Keep the overall number of threads low (adding all three).

ADD REPLY • link 2.6 years ago by GenoMax 141k

0

Entering edit mode

--loading-threads 12

--processing-threads 24

--writing-threads: Not Set, but I think this means it is default to 4 according to the bcl2fastq Processing Options

We have high performance storage and computing cluster. I will play around with these options and try to keep their numbers low.

ADD REPLY • link 2.6 years ago by DavidStreid ▴ 90

1

Entering edit mode

You would definitely want writing threads to be larger than the other two by 2 to 4.

ADD REPLY • link 2.6 years ago by GenoMax 141k

0

Entering edit mode

Thank you - this worked! Why did you recommend increasing the number of write threads?

ADD REPLY • link 2.6 years ago by DavidStreid ▴ 90

0

Entering edit mode

To provide adequate writing/processing buffer to make sure changes get written to disk.

If this answered your question then consider marking original answer accepted (green check mark for answer) to provide closure to this thread.

ADD REPLY • link 2.6 years ago by GenoMax 141k

0

Entering edit mode

Thank you - I will, but would like to keep it open for another few days, just in case others have other suggestions.

ADD REPLY • link 2.6 years ago by DavidStreid ▴ 90