Most Appropriate Software Detect Population Specific Structural Variation In Multiple Low-Coverage Genomes?
1
3
Entering edit mode
8.0 years ago
Rubal7 ▴ 800

Hello Everyone,

I am looking for some advice on which software would be most suitable for detecting structural variation (particularly deletions and duplications > 100kb) that are fixed or at high frequency between two populations. I have low coverage sequence data (4x per sample) from 40 individuals (20 in population A and 20 in population B). The sequencing is illumina paired end 120bp reads with ~ 200 bp insert sizes. I have come across ReadDepth, MrCanavar and Variation Hunter as all being possibly softwares and I was wondering if anyone had any advice on which of these or other software would be most suitable for this task. Ideally I would like to be able to determine what the freqency is of these variants in the populations, rather than pooling the samples from each population, although that could be done.

Some of the software such as MrCanavar and Variation Hunter require remapping with MrFast and time is a factor for these analyses, however ofcourse if they are the best software I will use them (better done right that quickly and wrong). Any feedback and advice is greatly appreciated. Please feel free to ask for additional information.

Best regards

copynumber • 2.8k views
ADD COMMENT
3
Entering edit mode
8.0 years ago

How low is low? This is a difficult problem. The 1000 Genomes Structural Variation subgroup have worked very hard on this problem, using many different tools.

See this paper: Mapping copy number variation by population-scale genome sequencing. Mills, R.E. et al, Nature 470,59-65 (2011)

What types of variation do you want to find, and at what length scale? Roughly speaking, at the low end (100bp-5kb) it is possible to call stuff reliably, to nucleotide resolution, especially if you pool samples for discovery and then genotype individually, using an assembler (Cortex, SGA) or something like PinDel. The assemblers should have very low FDR, Pindel maybe slightly higher. For larger events you can detect existence of an event and localise it to some extent if it is a deletion. Definitely worth looking up Genome STRiP from Bob Handsaker, If you want to find massive segmental duplications etc then I'd go with whatever the Eichler lab are doing these days.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. Low is 4x coverage per sample. Essentially this is a scan for variants of all sizes as we are looking at cases and controls. From a rough and ready look at read depth variation between the populations we see signs of some large duplications ~100-500kb and it would be good to see these being identified in a more computationally rigorous approach than custom scripts. MrCanavar and Variation Hunter seem appropriate too, and was wondering what else is out there. Will take a look at Genome STRiP. De Novo assembly is also something we plan to do (using Cortex) but before investing in that I am keen to do something preliminary with software for detecting structural variants. Thanks again for your helpful reply!

ADD REPLY
0
Entering edit mode

No problem. Good luck - sounds like you have a very ambitious study!

ADD REPLY

Login before adding your answer.

Traffic: 1397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6