Question

Fragment size and insert size

1

Entering edit mode

6.4 years ago

CY ▴ 750

I am aware that the fragment size depends on the strength of sonication. The insert size is the length of actual genomic sequence without adapter.

My confusion is: What are the considerations when deciding the fragment size? Say, we have an target region to be sequenced. What is difference of making the fragment size of 200 and making the fragment size of 400?

Besides, assuming we apply paired end sequencing and our read length is 150bp, the insert size of 400 makes an inner size of 100 and the insert size of 200 makes the paired end reads overlapping each other by 50bp (I guess we don't want the insert size < read length, right?). How would this two options makes difference? Is there a preference or consideration on whether making the paired end reads overlapping to each other? Really appreciate any comments :)

Fragment size sequencing insert size • 5.1k views

ADD COMMENT • link updated 6.4 years ago by d-cameron ★ 2.9k • written 6.4 years ago by CY ▴ 750

0

Entering edit mode

Can anyone share some comments? Thanks

ADD REPLY • link 6.4 years ago by CY ▴ 750

score 4 · Answer 1 · 2017-12-10

4

Entering edit mode

6.4 years ago

d-cameron ★ 2.9k

If your fragments are so short than you are sequencing adapters, then you are wasting sequencing. A non-trivial portion of your fragments will be < 150bp if you size select for 200bp fragments.

For SNV/indel caller, overlapping reads reduce your effective sequencing depth. Overlapping read can be used to error correct when the two reads from the fragment disagree but unless your SNV caller counts fragments instead of reads, it will double-count overlapping reads (two reads from the same fragment represent a single sampling, not two independent samping).

For structural variant (SV) calling, fragment size is extremely important. Increasing the fragment size increases the likelihood that a fragment will span across a SV breakpoint. 2x150bp with a 200bp median fragment length will have no read pair signal left at all.

If I had a choice between 200bp and 400bp for 2x150bp sequencing, I would choose the 400bp option unless there was a specific experimental design reason for shorter fragments.

NB: different tools/papers use different terminology for 'insert size', and 'fragment size'. Insert size can exclude the read bases (thus being negative if the read length is less than twice the fragment size), and fragment size may or many not include the adapters in the definition.

ADD COMMENT • link 6.4 years ago by d-cameron ★ 2.9k

0

Entering edit mode

I guess most of the SNV caller only count the base once even paired reads overlaps at this position, right?

ADD REPLY • link 6.4 years ago by CY ▴ 750

0

Entering edit mode

Also, do you think it is always good to merge paired end reads if they are overlapping? Tools like PEAR has this function. By merging them, we can call larger indel. Besides, I can't think of any advantage of not merging them

ADD REPLY • link 6.4 years ago by CY ▴ 750

1

Entering edit mode

I can't think of any advantage of not merging them

Me too, this is why I started a thread about it some time ago.

fin swimmer

ADD REPLY • link 6.4 years ago by finswimmer 16k

0

Entering edit mode

Merging will decrease performance of STR calling since there are many possible merging options when the overlapping sequence is repetitive (including the reads being non-overlapping but the fragment spanning a micro-satellite).

ADD REPLY • link 6.4 years ago by d-cameron ★ 2.9k

0

Entering edit mode

That's correct. But if the merger is aware of repetitive sequences and is not merging where the overlap is ambigious, you can still have the advantages of merging for the rest.

bbmerge from BBMap can do this.

fin swimmer

ADD REPLY • link 6.4 years ago by finswimmer 16k

0

Entering edit mode

If you're interested in STRs, you should not merge paired end reads as the merge will (almost always) be incorrect when the inner portion of the fragment contains a short tandem repeat motif.

ADD REPLY • link 6.4 years ago by d-cameron ★ 2.9k