I recently had four 400bp insert plant DNA libraries sequenced (2x150bp) using one HiSeq 4000 lane.
I've attached FastQC outputs for R1 and R2 of one of these libraries.
It seems like their are some issues with low-flow sections on the cell(?). R2 reads are noticeably lower quality than the R1 reads for all libraries.
~94% of reads from the flow cell remain after deduplication with bbmap (clumpify).
~62% of deduplicated reads remain after quality triming with trimmomatic (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100).
There seems to be some remaining adapter in some of the library reads.
Overall I've ended up with about 235,011,160 read pairs from the whole lane after deduplicating and trimming.
Other info: library preparation was PCR-free from physically sheared DNA; Inserts were sized by gel purification; Libraries were dual indexed.
My questions are:
- Is this quality/quantity typical from a commercial provider using this platform?
- Is the sequence bias observed at the 5' end of the reads observed in these libraries typical for a PCR-free library generated from physically sheared DNA?
Any additional comments appreciated.
Thanks in advance