Opinion on FastQC output for HiSeq 4000 PE sequencing run
3.3 years ago
quokka ▴ 10


I recently had four 400bp insert plant DNA libraries sequenced (2x150bp) using one HiSeq 4000 lane.

I've attached FastQC outputs for R1 and R2 of one of these libraries.

It seems like their are some issues with low-flow sections on the cell(?). R2 reads are noticeably lower quality than the R1 reads for all libraries.

~94% of reads from the flow cell remain after deduplication with bbmap (clumpify).

~62% of deduplicated reads remain after quality triming with trimmomatic (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100).

There seems to be some remaining adapter in some of the library reads.

Overall I've ended up with about 235,011,160 read pairs from the whole lane after deduplicating and trimming.

Other info: library preparation was PCR-free from physically sheared DNA; Inserts were sized by gel purification; Libraries were dual indexed.

My questions are:

  1. Is this quality/quantity typical from a commercial provider using this platform?
  2. Is the sequence bias observed at the 5' end of the reads observed in these libraries typical for a PCR-free library generated from physically sheared DNA?

Any additional comments appreciated.

Thanks in advance



3.3 years ago
igor 12k

Based on FastQC, these libraries look fine. These are long reads, so it's normal to notice a quality dropoff toward the end of R2.

I think you will find these posts very helpful:

Ok. Thanks Igor.

My first time with 150bp reads (and HiSeq 4000) so its good to get an idea on whats normal. Appreciate your insights.

I was a bit curious about the sequence bias at the 5' end because our library preparation didn't involve the use of transposases or random priming - nevertheless, I guess this could be because shearing and adapter ligation are somewhat sequence dependent.


