I am looking for some advice on which software would be most suitable for detecting structural variation (particularly deletions and duplications > 100kb) that are fixed or at high frequency between two populations. I have low coverage sequence data (4x per sample) from 40 individuals (20 in population A and 20 in population B). The sequencing is illumina paired end 120bp reads with ~ 200 bp insert sizes. I have come across ReadDepth, MrCanavar and Variation Hunter as all being possibly softwares and I was wondering if anyone had any advice on which of these or other software would be most suitable for this task. Ideally I would like to be able to determine what the freqency is of these variants in the populations, rather than pooling the samples from each population, although that could be done.
Some of the software such as MrCanavar and Variation Hunter require remapping with MrFast and time is a factor for these analyses, however ofcourse if they are the best software I will use them (better done right that quickly and wrong). Any feedback and advice is greatly appreciated. Please feel free to ask for additional information.
Thanks for the reply. Low is 4x coverage per sample. Essentially this is a scan for variants of all sizes as we are looking at cases and controls. From a rough and ready look at read depth variation between the populations we see signs of some large duplications ~100-500kb and it would be good to see these being identified in a more computationally rigorous approach than custom scripts. MrCanavar and Variation Hunter seem appropriate too, and was wondering what else is out there. Will take a look at Genome STRiP. De Novo assembly is also something we plan to do (using Cortex) but before investing in that I am keen to do something preliminary with software for detecting structural variants. Thanks again for your helpful reply!
No problem. Good luck - sounds like you have a very ambitious study!