Question

How to combine a run with quality paired sequences and a run with only quality forward reads

0

Entering edit mode

9.2 years ago

Yongjie Zhang ▴ 110

Dear All,

I have a question on analyzing amplicon Miseq sequencing data and hope to have your suggestion.

Specifically, I have 100 samples but are Miseq sequenced in two separate runs. In the first run including the first 50 samples, I have sufficient paired sequences from both forward and reverse reads. In the second run including other 50 samples (five samples are also included in the first run and are repeatedly sequenced), I only have sufficient forward reads, but the sequencing quality of reverse reads is too bad to use. My question is what's the best practice between 1) using paired sequences from the first run and forward reads from the second run, and 2) using forward reads from both the first run and the second run? I have tried the first practice and found the same five samples that are repeatedly sequenced in the two separate runs did not group by samples but rather by runs in PCoA analysis. I also detected significant beta-community difference between the two separate sequencing of the five samples.

Please let me know if you have any comments or suggestions. Thanks in advance.

Best wishes,

Yongjie ZHANG

Amplicon Miseq • 2.2k views

ADD COMMENT • link updated 24 months ago by Ram 43k • written 9.2 years ago by Yongjie Zhang ▴ 110

1

Entering edit mode

How many bases did you read from each side?

What organism?

What are your going to do with the reads?

ADD REPLY • link 9.2 years ago by Asaf 10k

score 0 · Answer 1 · 2015-02-24

I guess an outstanding question is why the quality is poor on the second reads. If the quality and/or accuracy also suffered on the first reads, but to a lesser extent perhaps, then you would expect to see some differences when clustered. Have you performed appropriate cleaning/trimming before doing anything else?

If the reads are indeed good and accurate, I would have no problem treating them as unpaired for mapping the same way that I would include reads where one mate failed cleaning as unpaired. I'd use all the data available to me. There is no reason to throw out the additional information from the reverse reads unless you were really worried about differential coverage among your datasets. In this case, you could subsample to the same approximate coverage to mitigate bias.