Entering edit mode
3.7 years ago
Nyksubuz
▴
10
I have a VCF file and have few queries to solves. Can someone help me to proceed with the same?
The questions are as following:
- How many variant records does the file contain?
- How many genotype calls are there per variant record?
- Are the genotype calls phased or unphased?
- Write code or pseudo code (in any language of your choosing) to calculate allele frequencies for each variant in the file
- Design a relational database schema to store the following information: ● variant ID ● chromosomal location of the variant ● the alleles and their corresponding frequencies
- Write code or pseudo code to populate your database schema from the VCF file
- How might you store the genotypes such that they could be retrieved quickly, for a project that has produced genotypes for ~1200 individuals across ~80 million sites?