Poly-G in head of read NovaSeq
0
0
Entering edit mode
3.8 years ago
godth13teen ▴ 70

Hi, I recently got some problems with the output of NovaSeq 6000. I ran sample in paired mode, then found out that a portion of read 2 has poly-G (around 50bp) at the beginning. I understand that NovaSeq is a 2-color system, so the poly-G is likely signal lost, but I don't understand why it only appears at the beginning of the second read. I have considered:

  • DNA strand break from sample: if this happen, then the read 1 won't have signal of the break at all
  • DNA strand break from after fragmentation: if this happen, then the polyG will occur even at the tail of read 1.
  • Reverse strand break at head: then how can the tail still have base? I think it will falls out of the adapter, am I right?
  • Reverse strand break at tail: then polyG will occur at the tail, like many other reported case.

This phenomenon is really confusing and I haven't found any answer/explanation for it yet. I have tried it will new library prep kit but this still happens. I am using TruSeq DNA PCR-Free from Illumina

Any help/advice is warmly welcome!

Thank you

sequencing • 3.4k views
ADD COMMENT
0
Entering edit mode

I agree, that's weird. I don't have experience with NovaSeq but with NextSeq it's usually all G's and due to problems in the DNA synthesis for the 2nd read (we assumed). I would ask an Illumina representative.

ADD REPLY
0
Entering edit mode

I ask Illumina representative but they haven't give me a clear answer yet, unfortunately, they suggest trim the 25G from the read to pass the fastqc, but I disagree with that method, it didn't fix the problem.

ADD REPLY
0
Entering edit mode

You have not told us what kind of libraries are these. If you are doing something non-standard then you need to consider non-standard solutions for downstream data processing. If you got this result with 2 different runs (even different libraries?) then it is a reproducible one. You may also want to consult the kit vendor to see what may be going on. May just be a bad lib prep kit.

ADD REPLY
0
Entering edit mode

I am using TruSeq DNA PCR-Free from Illumina, when I reported the problem, they give me 2 new kits for testing but the problem occurred again

ADD REPLY
0
Entering edit mode

I would throw any read that starts with a long G stretch

ADD REPLY
0
Entering edit mode

Yes, I also considered throwing away both the bad reads and their pair. But as I said, this method is just for data processing, not the real cause of the problem

ADD REPLY
0
Entering edit mode

I met the same problem. I have some BS-seq reads, and it made it a disaster. Cutadapt cannot work well with the read2 beginning Gs, because they are not the same thing as the Gs at the end. Most of the read 2 beginning Gs are not only Gs but something like GGGAGACGAGAGAGG and they will have very high quality scores. It is very werid, because we have a NextSeq but we never found such a phenomenon in NextSeq runs.

read1:

read1

read2:

read2

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6