Entering edit mode
2.7 years ago
godth13teen ▴ 70
Hi, I recently got some problems with the output of NovaSeq 6000. I ran sample in paired mode, then found out that a portion of read 2 has poly-G (around 50bp) at the beginning. I understand that NovaSeq is a 2-color system, so the poly-G is likely signal lost, but I don't understand why it only appears at the beginning of the second read. I have considered:
- DNA strand break from sample: if this happen, then the read 1 won't have signal of the break at all
- DNA strand break from after fragmentation: if this happen, then the polyG will occur even at the tail of read 1.
- Reverse strand break at head: then how can the tail still have base? I think it will falls out of the adapter, am I right?
- Reverse strand break at tail: then polyG will occur at the tail, like many other reported case.
This phenomenon is really confusing and I haven't found any answer/explanation for it yet. I have tried it will new library prep kit but this still happens. I am using TruSeq DNA PCR-Free from Illumina
Any help/advice is warmly welcome!
I agree, that's weird. I don't have experience with NovaSeq but with NextSeq it's usually all G's and due to problems in the DNA synthesis for the 2nd read (we assumed). I would ask an Illumina representative.
I ask Illumina representative but they haven't give me a clear answer yet, unfortunately, they suggest trim the 25G from the read to pass the fastqc, but I disagree with that method, it didn't fix the problem.
You have not told us what kind of libraries are these. If you are doing something non-standard then you need to consider non-standard solutions for downstream data processing. If you got this result with 2 different runs (even different libraries?) then it is a reproducible one. You may also want to consult the kit vendor to see what may be going on. May just be a bad lib prep kit.
I am using TruSeq DNA PCR-Free from Illumina, when I reported the problem, they give me 2 new kits for testing but the problem occurred again
I would throw any read that starts with a long G stretch
Yes, I also considered throwing away both the bad reads and their pair. But as I said, this method is just for data processing, not the real cause of the problem
I met the same problem. I have some BS-seq reads, and it made it a disaster. Cutadapt cannot work well with the read2 beginning Gs, because they are not the same thing as the Gs at the end. Most of the read 2 beginning Gs are not only Gs but something like GGGAGACGAGAGAGG and they will have very high quality scores. It is very werid, because we have a NextSeq but we never found such a phenomenon in NextSeq runs.
Same issue: High G and low A,C,T content in the 1-10bp of Read2 file in a paired-end whole genome bisulfite sequencing, why?