Entering edit mode
4.0 years ago
Chris Dean
▴
410
I have paired-end sequence reads that were sequenced on the Illumina HiSeq 2500 system. In a subset of samples, I noticed that a small proportion of FASTQ entries have duplicate read identifiers, sequences, quality scores and barcodes.
sample1:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample2:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample3:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample4:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
sample5:@HISEQ:664:HYGKJBCXY:2:2104:21100:19520 1:N:0:CTGAGCCA
Libraries were prepared and sequenced by a third-party company. Does anyone have a possible explanation for this and suggestions for moving forward?
Thanks, Chris
They messed us the demultiplexing somehow. I'm guessing each sample should have a different barcode, do you know which one corresponds to the barcode in the header? I would ask them to rerun bcl2fastq and regenerate the fastq files.