Question

miseq losing reads mysteriously between clusters and fastq

0

Entering edit mode

6.9 years ago

biostars • 0

I'm troubleshooting an odd 2x150 MiSeq run that nominally produced about 6.8Gb of data with % > Q30 94.2. There were ~ 23M clusters and a very high PF and 21M after cluster filtering. The samples are highly multiplexed but if I add up all the reads across all the samples, there are only 1.5M, so less than a 1/10th of expected. It's not a barcoding issue - there were some but not many undetermined sequences, but no more than usual. I've crawled over every illlumina log file and the cluster stats all look good. For example, the most numerously clustered sample had ~ 80k clusters but still managed to emit only 5k reads. I've plotted clusters against reads and it's very linear. Every sample lost sequences proportional to clusters, and it lost most of them.

So, what illumina workflow steps downstream of basecalling can cause reads to not get emitted during fastq generation? I've stared at every number the box generated. The most striking was the adaptercount.txt file which had totals of 580M across all samples for read 1 which seemed extremely high but otherwise nothing abnormal. I can't find any documentation suggesting the standard pipeline filters anything after passing clusters but maybe there are areas where clusters can still drop out? I suspect the insert size might be on the low side (maybe 150bp) but don't see anything else abnormal about this run,

Any pointers appreciated,

Darren

sequencing next-gen • 1.4k views

ADD COMMENT • link 6.9 years ago by biostars • 0

1

Entering edit mode

If your SampleSheet.csv file has adapter sequences in it then bcl2fastq automatically does adapter scanning/trimming. You can take those lines out of SampleSheet.csv and re-run the analysis (on- or off-sequencer). It is possible that you have a lot of adapter dimers/short inserts which are failing QC. Removing those lines should give you full length data that you can scan offline for presence of adapters.

If the above does not work, then are you able to do the processing off-sequencer using bcl2fastq? If so try adding option --with-failed-reads.

ADD REPLY • link 6.9 years ago by GenoMax 141k

0

Entering edit mode

Thanks - that sounds like a sensible diagnostic step. If we can see what was in there, I'm sure we can work out the molecular misadventure. Would be nice if the miseq gave a little more data logging wise on loss at this step. Maybe I missed something but it wasn't obvious where these stats live.

ADD REPLY • link 6.9 years ago by biostars • 0