Programming Challenge: Quickest Way To Determine The "Superpopulation" From A Vcf?
1
4
Entering edit mode
8.6 years ago

Given an exome or targeted human VCF of one or more samples, I need a program to determine the "superpopulation" of each sample, as listed here:

http://www.1000genomes.org/category/frequently-asked-questions/population

ASN EUR AFR AMR SAN

The program should return a single three letter code for each sample.

Submissions will be judged on speed using 10 randomly selected subsets of 1KG samples - you cannot count on any "crucial" regions being covered.

Each "miss" will result in a penalty that is effectively 50% of the best time for the next best tier (a miss of one call will tack on half the entire time it took to call all 10 correctly)

vcf • 2.1k views
ADD COMMENT
0
Entering edit mode

So what am I allowed, if I cannot count on any specific region being there? How targeted could it be? Clearly some target regions will be uninformative...

ADD REPLY
0
Entering edit mode

sometimes we receive targeted resequencing samples that are, for example, just a bunch of cardiac genes. I would still like to make a guess as to the superpopulation.

ADD REPLY
0
Entering edit mode
8.6 years ago

Actually, my previous comment is an attempt at an answer, so here it is (will try and delete the comment):

Well, in general I would expect it not always to be possible. I would take the 1000Genomes SNP calls in your gene(s), and do a PCA (colouring each sample by population), and see if the super populations are evident in the PCA. If yes, it's very cheap to do a quick PCA for your sample and see where it lies compared with the 1000G populations. That's what I'd do, but I'm not an expert on that type of thing!

ADD COMMENT
0
Entering edit mode

Automate and implement

ADD REPLY
1
Entering edit mode

You're quite right, you asked for a program, not description of how to do it. I don't have time to do this now though, so I'll bow out of the rest of this discussion

ADD REPLY

Login before adding your answer.

Traffic: 1472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6