I am currently a Masters student in Bioinformatics and I am wanting to create a tool that I could show future employers and add to my portfolio. I am currently in a lab that works a lot with the 1000 genomes project VCF files and investigating human evolutionary history and migration patterns.
For my project, I had the idea of creating a haplotype generator for the Phase 3 VCF files. For example,
Ind1 Ind2 Ind3 0/1 0/0 1/1 0/0 0/1 1/1 1/0 1/0 0/0
would create Haplotypes:
001 - 2 100 - 1 010 - 1 110 - 2
I would want the user of this tool to be able to enter in the length of haplotype they want, and provide filters such an allele frequency filter, so users would be able to create haplotypes from rare SNPs or common SNPs.
The output of this tool could provide the most common or rare haplotypes for each population and breakdown of how prevalent each haplotype is across all populations for example:
10001010110111101010101 LWK-49% GBR-25% TSI - 23% .......
I wanted to see if anyone knows if a tool like this already exists? I have looked around and cannot find anything that does this function. Also, do you think this project would be useful? If not, is there another project related to 1000 genomes that you think would be more useful to take on? I am just looking for a small manageable project that could be useful to someone and help me find employment once I graduate. Thanks you!