Run Time Of Imputation Using 1000 Genomes
3
3
Entering edit mode
9.7 years ago
Psb ▴ 30

I am imputing on few candidate regions of 1.5Mb using Mach and 1000 genome phase 1 as reference dataset. I am following a two step approach for imputation. The run time for my first step is 3 hours, the command line for which was:

mach1 –p chr1.ped –d chr1.dat –h chr1ref.hap –s chr1ref.snp --greedy --rounds 50 --states 200 --compact --autoFlip --prefix step1_chr1

For second step of imputation, the command line was:

mach1 –p chr1.ped –d chr1.dat –h chr1ref.hap -s chr1ref.snps --errormap step1chr1.erate --cross step1chr1.rec --greedy --mle --mldetails --compact --autoFlip --mask 0.02 --prefix step2_chr1

It has been more than 48 hours and the programme is still running.According to the information available on Mach the second step is comparatively faster than the first step. Should it take this long? Can anyone tell me what went wrong??

imputation genome • 3.7k views
ADD COMMENT
0
Entering edit mode

Just a quick question: what's the file format for the "chr1ref.hap" file in "–h chr1ref.hap"?

Thanks!

ADD REPLY
3
Entering edit mode
9.7 years ago
lh3 32k

1000g uses four imputation algorithms: IMPUTE2, beagle, mach and snptools. The official released is produced by the UMich group using beagle followed by mach. They do not use mach only, because beagle is faster. As I remember, the whole process took about a month on a decent cluster. This is already slow. There are several on phasing/imputing genotyping data. The consensus is almost always: beagle is much faster than mach, but less accurate. So if you do everything with mach, it will be even slower.

ADD COMMENT
2
Entering edit mode
9.7 years ago

You should try BEAGLE it is very fast!

@Khader Shameer: just for you :-)

Browning & Browning 2011

ADD COMMENT
0
Entering edit mode

It will be nice if you could add a link to the software and paper that describe BEAGLE with benchmarking details to make the answer more informative.

ADD REPLY
1
Entering edit mode
9.7 years ago
Genotepes ▴ 950

True tha BEAGLE is fast although it has some quite large memory requirments.

An alternative is to use SHAPEIT for prephasing - actually it is just phasing your own data and the "pre" comes from the fact that you are doing this before imputation.

Splitting the Imputation into a (pre) phasing and a pure imputation steps is very convenient.

Everything is explained here.

I'd advise to run IMPUTE without prephasing on your best hits .doesn't take much time.

Here is the URL with SHAPEIT and all explanations.

The accuracy of this 2-stage process has been evaluated, but not thoroughly (quite recent). There is a loss of information but it seems to be negligible - especially as the time gained is huge.

Best

http://www.shapeit.fr/

ADD COMMENT

Login before adding your answer.

Traffic: 2257 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6