Hey all. So to outline my goal right now I'm currently working on my honours thesis, and my project involves interpreting the different haplotypes of two genes and seeing if they have an evolutionary link. I am using 1000 Genome Project phase 3 data, which is already phased and seems like it should be straightforward to pop into a program and do some analysis on haplotypes. The closest thing I could find for this was haploview, but there are a slew of problems that make haploview a no go for me. I've basically decided to just write a tool in R that can take phased .vcf files, do some statistical magic, and pop out a file with nice, easy to read stats and graphics. Before I start working on the tool I just wanted to get some input from anyone working in this area to see if there's anything they think would be useful for me to implement in this package. For now the things that are important to my project are as follows:
able to take multiple .vcf files as input
batch process capable for use on a computing cluster
ability to tag specific SNPs, or just work on all SNPs above a MAF cutoff
calculation of maybe FST, Hardy Weiberg, linkage disequilibreum, etc
maybe output some nice looking visualizations of the data
This could be a very bare-bones program that I just use for my purposes, but my thesis advisor has alluded to the option of me making a more robust package and just use that as my project altogether. If there's enough interest in something like this, or you folks have any input for me I may go that route. Any input is much appreciated.