4.0 years ago by
Detecting large-scale copy-neutral events purely from short read sequencing data is difficult. Your entire signal is in the breakpoints so the strength of that signal is depend on the mappable coverage you have across the breakpoints. Events in extremely repetitive sequence (eg centromeres) are unlikely to be found by either short or long read sequencing. In such cases you'd have better luck with karyotyping (eg FISH).
As for your question, it depends on your library fragment size. If you do 2x300bp sequencing with a mean library fragment size of 500bp, your signal will be much weaker than the same sequencing on a library with 1,000bp fragments. Mate-pair libraries have much longer fragment lengths, thus will require lower coverage for the same signal strength. If you don't need an exact breakpoint reconstruction, then you get more value per sequenced base by sequencing shorter reads (since more fragments span the breakpoint) but if you need exact breakpoint sequence reconstruction, then sequencing longer reads is preferable due to the better mapability of the longer reads.
any recommended/standard best practices guidelines/pipelines for BCA breakpoint identification and/or any other SV identification analysis. How to analyze the data?
SV calling is still well behind SNV and small indel calling both in terms both specificity and sensitivity and hasn't yet matured to the point where you can swap out callers and get results that are mostly the same. For the 2x300bp data, I would recommend GRIDSS as the SV caller, and StructuralVariantAnnotation for downstream analysis, but LUMPY and manta are other good options. For mate-pair, you will have a smaller selection of tools as most SV callers don't support mate-pair libraries. As you're looking for large-scale event, you can't filter on event size so you're going to get a lot of false positive calls due to sequence homology.
Are you open to alternate approaches? Molecular barcoding approaches such as the 10X Genomics Chromium are showing some really impressive early results (eg http://biorxiv.org/content/early/2016/09/10/074518) but the analysis techniques are immature.
modified 4.0 years ago
4.0 years ago by
d-cameron • 2.3k