I have received paired-end reads for 40 samples. The reads are supposed to be 100bp per end. Instead, 20 of my samples are 101bp per end, while the other 20 are 100bp as expected.

Because of this, we assume that these 20 samples were all in the same lane; and that by accident there was an extra iteration in the illumina sequencing. However, we also see a strong negative correlation with read length and quality; the samples in which we had 101bp per end lose about 30% of the reads in the trimmomatic quality step.

My question is: Has this happened to anyone else? Does it occur often, and if so, does it often affect quality? We are really quite puzzled by this.

Any ideas / clues appreciated.

Thank you! -Thies Gehrmann

Did you check if the first or last letter is the same for every read?

ADD REPLYlink written 7.1 years ago by Raony GuimarĂ£es1.1k
I had Illumina 1.9 fastq data which has read lengths of 101bp. If this is right and that is something that @theisgehrmann expects. Then I guess the read quality at last 30 BP might be because of the Universal adapters that are added if the fragment size is small.

Good luck

I believe that the sequencing instrument can be operated only for sequencing lengths that are increments of 50bp. So no 101bp sequencing seems possible.

One can however run the bcl->fastq conversion with arbitrary parameters.

Is it possible that someone overrode the information in the setup file and ran these with incorrect barcode lengths? I could see how someone using an automated script generating other conversion scripts entered a one base shorter barcode that in turn affects the the other lengths.

I guess I'll have to call them up! Thank you!

50bp increments is true for HiSeq, but not true for e.g. GAII, which does 36, 75 and 100 ( However, I agree with the basic point that 101 seems unlikely :)

ADD REPLYlink written 7.1 years ago by Jelena Aleksic910
