Question: Joining vs merging paired ended reads
0
gravatar for drikaul
3 months ago by
drikaul10
drikaul10 wrote:

Hi community!

I have paired-ended amplicon sequences for a batch of samples, with very little overlap (<10%).

Conceptually, I was wondering if it makes sense to join the forward and reverse read to generate a single read for downstream processing, instead of interleaving/merging them to get the overlapping sequences, since that isn't the best solution in this particular case?

or perhaps, concatenating the R1 and R2 to read it in as a single read?

Thanks!

ADD COMMENTlink modified 3 months ago by h.mon29k • written 3 months ago by drikaul10

Or perhaps just keeping them as two separate paired-end reads? Why would you want to merge or join them?

ADD REPLYlink written 3 months ago by WouterDeCoster43k

The idea is to call OTUs on them, so I'm trying to figure out what the best way is to make use of the forward and reverse reads since the overlap is minimal. For now, I'm leaning more towards just using the forward reads, since their quality is pretty okay in comparison, but I was just wondering, if conceptually, it made sense to join the two?

ADD REPLYlink written 3 months ago by drikaul10
1

No it would not make sense in my opinion to just concatenate the forward and reverse reads. That has to do with the downstream analyses. If you blast there is a change that you don't get the right biological hit which is a must in this kind of studies. Did you already tried to merge them and see how good or bad it is?

ADD REPLYlink written 3 months ago by gb1.5k

Thanks, that makes sense and is on the lines of what I was thinking! If by merging the reads, you mean, checking to see the overlap, then yes, I already did that and it's minimal. Haven't tried joining them yet.

ADD REPLYlink written 3 months ago by drikaul10

Check out 'PANDAseq'.

"PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence."

You can also find many other similar tools on the web.

ADD REPLYlink modified 3 months ago • written 3 months ago by mbk0asis510
2
gravatar for Carambakaracho
3 months ago by
Carambakaracho2.0k
Germany/Cologne
Carambakaracho2.0k wrote:

In addition to PANDAseq, you may want to look into vsearch, FLASH2 and Pear, all of which can do overlap merging. A classic approach is to do the overlap merging where applicable and join/concatenate the pairs without sufficient overlap. The overlap is a function of the distribution of the fragment size and varies considerably between pairs. vsearch can join the reads, too - see for example Torbjorn Rogne's and Frederic Mahe's pipeline

Joining non overlapping reads can make sense or not, depending on what you plan in downstream processing. Some kmer based classification pipelines need the reads joined, in other cases it may not make sense.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Carambakaracho2.0k

yes, I used PEAR for this particular analysis, but have used vsearch in the past too. I'm trying to call OTUs on the merged reads, would joining the non-overlapping reads make sense for that purpose? The way I understand it, this might generate erroneous OTU sequences that would skew downstream clustering analysis.

ADD REPLYlink written 3 months ago by drikaul10

depends a bit, but sure clustering merged and joined sequences from the identical organism would yield two OTUs. On the other hand, when the differences in one pair don't justify merging, how likely would it be the pairs ended up in two separate OTUs?

ADD REPLYlink written 3 months ago by Carambakaracho2.0k
0
gravatar for h.mon
3 months ago by
h.mon29k
Brazil
h.mon29k wrote:

My personal recommendation would be to use primers appropriate to the sequencing platform, in such a way the amplicon is shorter than the sum of R1+R2 and you get a good overlap, allowing unambiguous merging of pairs. This way, you reduce the error rate at the end of the reads, where quality is lower and most error occur.

However, there is a tool developed to use both reads even when there is no overlap:

IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries

ADD COMMENTlink written 3 months ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour