Duplicate read identifiers across multiple samples

0

Entering edit mode

4.0 years ago

Chris Dean ▴ 410

I have paired-end sequence reads that were sequenced on the Illumina HiSeq 2500 system. In a subset of samples, I noticed that a small proportion of FASTQ entries have duplicate read identifiers, sequences, quality scores and barcodes.

sample1:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample2:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample3:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample4:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample5:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA

Libraries were prepared and sequenced by a third-party company. Does anyone have a possible explanation for this and suggestions for moving forward?

Thanks, Chris

sequencing next-gen • 581 views

ADD COMMENT • link 4.0 years ago by Chris Dean ▴ 410

0

Entering edit mode

They messed us the demultiplexing somehow. I'm guessing each sample should have a different barcode, do you know which one corresponds to the barcode in the header? I would ask them to rerun bcl2fastq and regenerate the fastq files.

ADD REPLY • link 4.0 years ago by Asaf 10k

Login before adding your answer.