Speed up ADMIXTURE cross validation
0
0
Entering edit mode
2.1 years ago

Hi everyone,

I'm running some analysis on ADMIXTURE in order to uncover population structure based on Arabidopsis thaliana accession found on the 1001 genomes project. To sum up fast what I did till now: First I have removed nearly-identical accessions,by calculating pairwise genome-wide identity-by-state differences using PLINK, and when pairs differed in less than < 0.01 changes per polymorphic site, I have randomly removed one member of the pair; then I have identified only biallelic SNPs with a genotype calling rate >95%, which resulted in a genome matrix of ~4 million SNPs, like so:

bcftools view -i ‘F_MISSING<0.05’ -m2 -M2 -v snps myvcf.vcf.gz -Oz -o myfilteredoutput.vcf.gz

After recoding it with plink in bed format, I'm now running ADMIXTURE using the cross-validation method to select the best K like so:

admixture –cv myfilteredoutput 2 > log2.out

I'm doing this for every K from 1 to 20, but it take AN IMMENSE amount of time (almost one day or more for each single K)

What I might do to speed up the process?

Thanks

arabidopsis. admixture population 1001genomes structure • 315 views
ADD COMMENT

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6