Question: Fragment size and insert size
1
gravatar for CY
6 months ago by
CY130
United States
CY130 wrote:

I am aware that the fragment size depends on the strength of sonication. The insert size is the length of actual genomic sequence without adapter.

My confusion is: What are the considerations when deciding the fragment size? Say, we have an target region to be sequenced. What is difference of making the fragment size of 200 and making the fragment size of 400?

Besides, assuming we apply paired end sequencing and our read length is 150bp, the insert size of 400 makes an inner size of 100 and the insert size of 200 makes the paired end reads overlapping each other by 50bp (I guess we don't want the insert size < read length, right?). How would this two options makes difference? Is there a preference or consideration on whether making the paired end reads overlapping to each other? Really appreciate any comments :)

ADD COMMENTlink modified 6 months ago by d-cameron1.8k • written 6 months ago by CY130

Can anyone share some comments? Thanks

ADD REPLYlink written 6 months ago by CY130
3
gravatar for d-cameron
6 months ago by
d-cameron1.8k
Australia
d-cameron1.8k wrote:

If your fragments are so short than you are sequencing adapters, then you are wasting sequencing. A non-trivial portion of your fragments will be < 150bp if you size select for 200bp fragments.

For SNV/indel caller, overlapping reads reduce your effective sequencing depth. Overlapping read can be used to error correct when the two reads from the fragment disagree but unless your SNV caller counts fragments instead of reads, it will double-count overlapping reads (two reads from the same fragment represent a single sampling, not two independent samping).

For structural variant (SV) calling, fragment size is extremely important. Increasing the fragment size increases the likelihood that a fragment will span across a SV breakpoint. 2x150bp with a 200bp median fragment length will have no read pair signal left at all.

If I had a choice between 200bp and 400bp for 2x150bp sequencing, I would choose the 400bp option unless there was a specific experimental design reason for shorter fragments.

NB: different tools/papers use different terminology for 'insert size', and 'fragment size'. Insert size can exclude the read bases (thus being negative if the read length is less than twice the fragment size), and fragment size may or many not include the adapters in the definition.

ADD COMMENTlink written 6 months ago by d-cameron1.8k

I guess most of the SNV caller only count the base once even paired reads overlaps at this position, right?

ADD REPLYlink written 6 months ago by CY130

Also, do you think it is always good to merge paired end reads if they are overlapping? Tools like PEAR has this function. By merging them, we can call larger indel. Besides, I can't think of any advantage of not merging them

ADD REPLYlink written 6 months ago by CY130
1

I can't think of any advantage of not merging them

Me too, this is why I started a thread about it some time ago.

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer2.8k

Merging will decrease performance of STR calling since there are many possible merging options when the overlapping sequence is repetitive (including the reads being non-overlapping but the fragment spanning a micro-satellite).

ADD REPLYlink written 6 months ago by d-cameron1.8k

That's correct. But if the merger is aware of repetitive sequences and is not merging where the overlap is ambigious, you can still have the advantages of merging for the rest.

bbmerge from BBMap can do this.

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer2.8k

If you're interested in STRs, you should not merge paired end reads as the merge will (almost always) be incorrect when the inner portion of the fragment contains a short tandem repeat motif.

ADD REPLYlink written 6 months ago by d-cameron1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour