Question

Poly-G in head of read NovaSeq

0

Entering edit mode

3.8 years ago

godth13teen ▴ 70

Hi, I recently got some problems with the output of NovaSeq 6000. I ran sample in paired mode, then found out that a portion of read 2 has poly-G (around 50bp) at the beginning. I understand that NovaSeq is a 2-color system, so the poly-G is likely signal lost, but I don't understand why it only appears at the beginning of the second read. I have considered:

DNA strand break from sample: if this happen, then the read 1 won't have signal of the break at all
DNA strand break from after fragmentation: if this happen, then the polyG will occur even at the tail of read 1.
Reverse strand break at head: then how can the tail still have base? I think it will falls out of the adapter, am I right?
Reverse strand break at tail: then polyG will occur at the tail, like many other reported case.

This phenomenon is really confusing and I haven't found any answer/explanation for it yet. I have tried it will new library prep kit but this still happens. I am using TruSeq DNA PCR-Free from Illumina

Any help/advice is warmly welcome!

Thank you

sequencing • 3.4k views

ADD COMMENT • link updated 2.8 years ago by Jianheng Liu (Fox) • 0 • written 3.8 years ago by godth13teen ▴ 70

0

Entering edit mode

I agree, that's weird. I don't have experience with NovaSeq but with NextSeq it's usually all G's and due to problems in the DNA synthesis for the 2nd read (we assumed). I would ask an Illumina representative.

ADD REPLY • link 3.8 years ago by Asaf 10k

0

Entering edit mode

I ask Illumina representative but they haven't give me a clear answer yet, unfortunately, they suggest trim the 25G from the read to pass the fastqc, but I disagree with that method, it didn't fix the problem.

ADD REPLY • link 3.8 years ago by godth13teen ▴ 70

0

Entering edit mode

You have not told us what kind of libraries are these. If you are doing something non-standard then you need to consider non-standard solutions for downstream data processing. If you got this result with 2 different runs (even different libraries?) then it is a reproducible one. You may also want to consult the kit vendor to see what may be going on. May just be a bad lib prep kit.

ADD REPLY • link 3.8 years ago by GenoMax 141k

0

Entering edit mode

I am using TruSeq DNA PCR-Free from Illumina, when I reported the problem, they give me 2 new kits for testing but the problem occurred again

ADD REPLY • link 3.8 years ago by godth13teen ▴ 70

0

Entering edit mode

I would throw any read that starts with a long G stretch

ADD REPLY • link 3.8 years ago by Asaf 10k

0

Entering edit mode

Yes, I also considered throwing away both the bad reads and their pair. But as I said, this method is just for data processing, not the real cause of the problem

ADD REPLY • link 3.8 years ago by godth13teen ▴ 70

0

Entering edit mode

I met the same problem. I have some BS-seq reads, and it made it a disaster. Cutadapt cannot work well with the read2 beginning Gs, because they are not the same thing as the Gs at the end. Most of the read 2 beginning Gs are not only Gs but something like GGGAGACGAGAGAGG and they will have very high quality scores. It is very werid, because we have a NextSeq but we never found such a phenomenon in NextSeq runs.

read1:

read2:

ADD REPLY • link 2.8 years ago by Jianheng Liu (Fox) • 0

0

Entering edit mode

Same issue: High G and low A,C,T content in the 1-10bp of Read2 file in a paired-end whole genome bisulfite sequencing, why?

ADD REPLY • link 2.8 years ago by Jianheng Liu (Fox) • 0