Question: How to combine a run with quality paired sequences and a run with only quality forward reads
gravatar for Yongjie Zhang
4.0 years ago by
UC Berkeley, USA/ Shanxi Univ, China
Yongjie Zhang80 wrote:

Dear All,

I have a question on analyzing amplicon Miseq sequencing data and hope to have your suggestion.

Specifically, I have 100 samples but are Miseq sequenced in two separate runs. In the first run including the first 50 samples, I have sufficient paired sequences from both forward and reverse reads. In the second run including other 50 samples (five samples are also included in the first run and are repeatedly sequenced), I only have sufficeint forward reads, but the sequencing quality of reverse reads is too bad to use. My question is what's the best practice between 1) using paired sequences from the first run and forward reads from the second run, and 2) using forward reads from both the first run and the second run?  I have tried the first practice and found the same five samples that are repeatedly sequenced in the two separate runs did not group by samples but rather by runs in PCoA analysis. I also detected signifiant beta-community difference between the two separate sequencing of the five samples.

Please let me know if you have any comments or suggestions. Thanks in advance.

Best wishes,

Yongjie ZHANG

amplicon miseq data • 1.4k views
ADD COMMENTlink modified 4.0 years ago by Brice Sarver2.5k • written 4.0 years ago by Yongjie Zhang80

How many bases did you read from each side?

What organism?

What are your going to do with the reads?

ADD REPLYlink written 4.0 years ago by Asaf5.2k
gravatar for Brice Sarver
4.0 years ago by
Brice Sarver2.5k
United States
Brice Sarver2.5k wrote:

I guess an outstanding question is why the quality is poor on the second reads. If the quality and/or accuracy also suffered on the first reads, but to a lesser extent perhaps, then you would expect to see some differences when clustered. Have you performed appropriate cleaning/trimming before doing anything else?

If the reads are indeed good and accurate, I would have no problem treating them as unpaired for mapping the same way that I would include reads where one mate failed cleaning as unpaired. I'd use all the data available to me. There is no reason to throw out the additional information from the reverse reads unless you were really worried about differential coverage among your datasets. In this case, you could subsample to the same approximate coverage to mitigate bias.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Brice Sarver2.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour