IMPUTE2 minimal number of SNPs per chunk
1
0
Entering edit mode
2.8 years ago
nhaus ▴ 300

I am using an Illumina omni 2.5 genotyping array and plan on using impute2 to perform imputation.

The documentation of impute2 recommends to process whole chromosomes in chunks of ~5MB. However, I did not find any information regarding the minimal number of SNPs that should be present per chunk.

It feels "wrong" to impute tens of thousands of genotypes from the reference if there are only ~100 SNPs in the chunk that I am analyzing.

Is my gut feeling just wrong here or can you tell me any recommendations on how to deal with this?

Cheers!

impute gwas • 1.2k views
ADD COMMENT
0
Entering edit mode

Tip - there are many faster/more memory imputation algorithms than IMPUTE2 - check out beagle5 or IMPUTE5 - they will be much easier to use.

ADD REPLY
0
Entering edit mode

Thanks! I will do that

ADD REPLY
2
Entering edit mode
2.8 years ago
LauferVA 4.2k
  1. Impute2 is no longer considered state-of-the-art.
  2. Regarding how many variants you can impute ... If I were in your shoes, I would start by reading about linkage disequilibrium and how imputation algorithms actually work. Fact is, thousands of SNPs can be in strong linkage with one another; HLA is a prime example. Thus, if you can impute rs12345 accurately, you can impute any number of its 'buddies' (other SNPs in perfect LD with it) just as accurately. For relate reasons, it is more or less standard for 90% of your SNPs to be imputed when starting with DNA microarray data.
  3. The number of SNPs per block is in many contexts an important predictor of accuracy. In the event that your LD estimates are bad, you will ascertain this on the back end by looking at imputation accuracy. Google imputation accuracy and start reading. You'll want to exclude variants with poor imputation accuracy as the estimates you are getting aren't reliable. Pretty much every protocol under the sun gives a cutoff for imputation accuracy. For example, see Anderson CA 2010 Nature Protocols. The way I would approach it is, start with their recommendation. If after you're done, you scan your results and imputation accuracy for one 5Mb block is really low, and that block also has few SNPs, then re-run that area with a larger window.
ADD COMMENT
0
Entering edit mode

Thank you.

Impute2 is no longer considered state-of-the-art.

Can you tell me what is considered state-of-the-art as of today?

ADD REPLY
0
Entering edit mode

IMPUTE5, QUILT, Beagle5 are all worth considering.

ADD REPLY
0
Entering edit mode

You're welcome. I see I've been beaten to the punch here - do you have remaining questions? If so, happy to take a look. If not, please consider accepting my answer :-)

best, vl

ADD REPLY

Login before adding your answer.

Traffic: 3111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6