Question

Speed up ADMIXTURE cross validation

0

Entering edit mode

2.1 years ago

simone.castellana ▴ 10

Hi everyone,

I'm running some analysis on ADMIXTURE in order to uncover population structure based on Arabidopsis thaliana accession found on the 1001 genomes project. To sum up fast what I did till now: First I have removed nearly-identical accessions,by calculating pairwise genome-wide identity-by-state differences using PLINK, and when pairs differed in less than < 0.01 changes per polymorphic site, I have randomly removed one member of the pair; then I have identified only biallelic SNPs with a genotype calling rate >95%, which resulted in a genome matrix of ~4 million SNPs, like so:

bcftools view -i ‘F_MISSING<0.05’ -m2 -M2 -v snps myvcf.vcf.gz -Oz -o myfilteredoutput.vcf.gz

After recoding it with plink in bed format, I'm now running ADMIXTURE using the cross-validation method to select the best K like so:

admixture –cv myfilteredoutput 2 > log2.out

I'm doing this for every K from 1 to 20, but it take AN IMMENSE amount of time (almost one day or more for each single K)

What I might do to speed up the process?

Thanks

arabidopsis. admixture population 1001genomes structure • 315 views

ADD COMMENT • link 2.1 years ago by simone.castellana ▴ 10