Question: To merge or not to merge during de novo assembly?
0
gravatar for robert.murphy
9 weeks ago by
robert.murphy30 wrote:

When doing a de novo assembly, either hybrid or with just short reads, should you merge paired end short reads?

What are the pros and cons of both?

There are some threads touching on this issue but none with concise answers and information.

Any advice would be appreciated.

assembly • 130 views
ADD COMMENTlink written 9 weeks ago by robert.murphy30
1

If you have reads that are overlapping then the inserts are too short. That may not be good for getting good assemblies.

ADD REPLYlink written 9 weeks ago by GenoMax94k

Ah so paired end reads don't overlap?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by robert.murphy30

No. Overlapping read libraries are designed for specific applications. e.g. 16S sequencing.

ADD REPLYlink written 9 weeks ago by GenoMax94k

Ah yes of course the paired end reads don't always overlap.

Why do some people merge when doing assembly then?

And when should or would you ever merge reads then, what purpose does it serve?

ADD REPLYlink written 9 weeks ago by robert.murphy30

under normal theoretical conditions paired end read libraries should indeed not overlap. However, even when it's not intended they sometimes can still overlap. library size selection is not so precise that it's on a single lenght, more on 'range', so there might be some paired ends that still do overlap.

In the above mentioned case it might be worth to first merge overlapping reads. In any case it will remove some confusion for the assembler as it will thus not encounter reads with a negative distance.

You can also deliberately makes such overlapping libraries (as @genomax already indicated) . rationale here is to get bigger pieces to feed to the assembler, as you already know they are derived from a single molecule

ADD REPLYlink written 9 weeks ago by lieven.sterck9.4k

Thank for the reply and help.

is not so precise that it's on a single length

Does this mean the insert size here between the forward and reverse read?

In the above mentioned case it might be worth to first merge overlapping reads. In any case it will remove some confusion for the assembler as it will thus not encounter reads with a negative distance.

Would this not always apply when using paired ends reads?

as you already know they are derived from a single molecule

I was under the impression all paired end reads are derived from a single molecule?

ADD REPLYlink modified 9 weeks ago by lieven.sterck9.4k • written 9 weeks ago by robert.murphy30
1

Does this mean the insert size here between the forward and reverse read?

depends on the interpretation (it's kinda confusing) , but that's roughly how you can interpret it indeed (though more often it's the whole length of the fragment from which the forward and reverse read are sequenced, for detail google it I would suggest)

Would this not always apply when using paired ends reads?

as others , such as @genomax, have indicated it should not happen and it's rather an artefact of poor lib prep (exceptions excluded), so in most cases this merging will not result in much as there should be few to no reads overlapping

I was under the impression all paired end reads are derived from a single molecule?

absolutely correct. but that does not mean the assembler successfully will merge them (lots of other factors in play here), so if you merge them beforehand, you can feed them to the assembler as single end reads (== no assembly required)

ADD REPLYlink written 8 weeks ago by lieven.sterck9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour