Overrepresented stretches of Gs in NovaSeq library
1
0
Entering edit mode
2.9 years ago
serpalma.v ▴ 70

FastQC reports in the overrepresented sequences module a warning for all R2 FASTQ's corresponding to a stretch of >50 G's, amounting to <0.5% and <0.3% of the whole library before and after removing of adapters (TrueSeq Nano), respectively.

Our partner that performed the labwork advised not to concern about this, idicating that it is a known artifact of NovaSeq runs and that their quality checks did not pick up on this.

I would like to get some advice on whether or not I should deal with this warning (i.e. add this GGG...GGG to the adapter file passed to trimmomatic), or should I just leave it as it is and proceed with the alignments.

fastqc sequencing DNA • 2.5k views
0
Entering edit mode
2.9 years ago
michael.ante ★ 3.7k

Hi serpalma.v ,

Novaseq has the same technique as the NextSeq. The 'G' is identified by detecting no signal (here). Thus, if you run out of your template, the machine reports a G-stretch. This stretch will have a very low sequence quality and trimmed using low-quality tail trimming.

Thus, I'd get rid of the poly-G.

Cheers,

Michael

0
Entering edit mode

Thanks Michael, is there a tool that you would recomend to do the job? I am not sure if trimmomatic is the right tool for this.

Thanks

0
Entering edit mode

After checking the literature, I found two softwares that address, among other things, the poly-G problem: AfterQC and fastp. Both from the same author, and the later is faster than the former.

0
Entering edit mode

I've worked with bbduk.sh from BBMAP and with cutadept. The first is quite fast and flexible the later is better suited for PE-reads (IMHO).