what cause poly-G from NextSeq
2
1
Entering edit mode
4.7 years ago
CY ▴ 670

I have some samples sequenced on NextSeq and fastq data of some samples shown enriched poly-G reads. I did some research and found out these poly-G is caused by NextSeq's 2-colour chemistry not being able to distinguish "G" and "no signal".

However, How does this problem occur? What is the possible reason that so many "non signal" exist during sequencing?

nextseq • 8.9k views
6
Entering edit mode
4.7 years ago
GenoMax 120k
3
Entering edit mode
4.7 years ago
chen ★ 2.4k

This is because NextSeq uses a two-colour chemistry system, different from the HiSeq series that use a four-colour system.

In the two-colour system:

• Green denotes T
• Red denotes C
• Yellow (Green + Red) denotes A
• Black ( No Green, no Red) denotes G

As the sequencing by synthesis (SBS) goes on cycle by cycle, the signal strength will decrease. When the signal is too weak to be detected, it will be recognised as a G in the base calling stage, which causes a sequencing error. This is why you may see a lot of polyG in the read tails.

This is why so many polyG in NextSeq, and I have two news for this issue:

• 1, NovaSeq also has the same issue, since it's also a two-colour system.
• 2, I have developed a tool to address this issue automatically, which is called fastp. fastp is designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.

You can see https://github.com/OpenGene/fastp#polyg-tail-trimming for more details.

0
Entering edit mode

what is better to remove adapters, quality and poly-G filtering: AfterQC or fastp?

0
Entering edit mode

Any scan/trim tool should work. It would be your preference. Add bbduk.sh from BBMap suite to that list as well.