Overrepresented stretches of Gs in NovaSeq library
1
0
Entering edit mode
2.9 years ago
serpalma.v ▴ 70

FastQC reports in the overrepresented sequences module a warning for all R2 FASTQ's corresponding to a stretch of >50 G's, amounting to <0.5% and <0.3% of the whole library before and after removing of adapters (TrueSeq Nano), respectively.

Our partner that performed the labwork advised not to concern about this, idicating that it is a known artifact of NovaSeq runs and that their quality checks did not pick up on this.

I would like to get some advice on whether or not I should deal with this warning (i.e. add this GGG...GGG to the adapter file passed to trimmomatic), or should I just leave it as it is and proceed with the alignments.

fastqc sequencing DNA • 2.5k views
ADD COMMENT
0
Entering edit mode
2.9 years ago
michael.ante ★ 3.7k

Hi serpalma.v ,

Novaseq has the same technique as the NextSeq. The 'G' is identified by detecting no signal (here). Thus, if you run out of your template, the machine reports a G-stretch. This stretch will have a very low sequence quality and trimmed using low-quality tail trimming.

Thus, I'd get rid of the poly-G.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Thanks Michael, is there a tool that you would recomend to do the job? I am not sure if trimmomatic is the right tool for this.

Thanks

ADD REPLY
0
Entering edit mode

After checking the literature, I found two softwares that address, among other things, the poly-G problem: AfterQC and fastp. Both from the same author, and the later is faster than the former.

ADD REPLY
0
Entering edit mode

I've worked with bbduk.sh from BBMAP and with cutadept. The first is quite fast and flexible the later is better suited for PE-reads (IMHO).

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6