23 months ago by
If you want to get ancestry estimates for your sample, probably the easiest way would be to do that using ADMIXTURE, a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm (as described on the website: https://www.genetics.ucla.edu/software/admixture/)
I recommend you to follow the manual and run the program. Don't forget to remove related individuals from your dataset (you can use PLINK's PI_HAT value for this). Do QC of your individuals and SNPs (you can do this using PLINK as well).
Before running ADMIXTURE you will have to prune your SNPs, i.e. removing LD between them. Check the
--indep-pairwise function in PLINK. PLINK can also read VCF files as long as they are bi-allelic sites.
Depending on what type of analysis you want to conduct, maybe you would want to explore more sophisticated methods. For instance, check Chromopainter, which uses haplotype information and not just allele frequency in order to estimate "more accurate" ancestry proportions: http://www.paintmychromosomes.com/ However, you will need to phase your data first, and can be a little bit more complicated to run.