Question

How to remove poly G in Nextseq data

1

Entering edit mode

9.2 years ago

HG ★ 1.2k

I am analyzing a Nextseq run from bacterial data set as like few post earlier here I also found straight line of G's at the end of the read, can anyone suggest me how to overcome this problem?? any script or tool.

assembly nextseq trimming • 6.5k views

ADD COMMENT • link updated 24 months ago by Ram 43k • written 9.2 years ago by HG ★ 1.2k

Ram · Answer 1 · 2015-02-16

0

Entering edit mode

9.2 years ago

Asaf 10k

I use cutadapt with a poly-G as adapter, you should allow some errors because the poly-G sometimes combine an occasional A-C-T base.

When I analyze paired-end, the second mate is sometimes a poly-G and they I remove it by testing if the read has more than 80% G's. If that's the case I disregard the entire read (or use it as single-end, depends on what I do with it later).

ADD COMMENT • link 9.2 years ago by Asaf 10k

0

Entering edit mode

Thanks for your reply. For second part of your comment: could you please suggest how I can do such a job any script? Because I have 4 paired-end reads for each sample.

ADD REPLY • link updated 24 months ago by Ram 43k • written 9.2 years ago by HG ★ 1.2k

0

Entering edit mode

What I do is run fastqc and then test if poly-G is one of the over-represented sequences (and for what extent).

Then,actually, cutadapt with poly-G as adapter will remove the read but you should give it both mates as input (I think it will remove both of them but I'm not sure)

ADD REPLY • link updated 24 months ago by Ram 43k • written 9.2 years ago by Asaf 10k

0

Entering edit mode

I checked with FastQC as you suggested, in my data set there is no over-represented sequence and mean quality score is 35. So I hope without any processing the data set I can directly run assembly. What do you think? I used Spades for assembly which also have some error correction steps in ion-hammer.

ADD REPLY • link updated 24 months ago by Ram 43k • written 9.2 years ago by HG ★ 1.2k

0

Entering edit mode

Sounds good, I can only dream of getting such numbers. Did you run both files (R1 and R2)?

ADD REPLY • link updated 24 months ago by Ram 43k • written 9.2 years ago by Asaf 10k

0

Entering edit mode

Yes I did. I assembled also my data set with a good output N50 value number of contig

ADD REPLY • link updated 24 months ago by Ram 43k • written 9.2 years ago by HG ★ 1.2k