Aim: Download public data in a range, calculate the frequency of haplotypes in that region for overall and each ethnic population.
I want to download the region around BRCA1 from the 1000genomes data.
tabix -fh http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:41,196,312-41,277,340 >BRCA1_1000g_20101123.vcf
So I have my BRCA1 genotype data and I want to check the frequency just as a QC measure.
vcftools --gzvcf BRCA1_1000g_20101123.vcf.gz \ --freq \ --out BRCA1Copy_1000g_20101123.vcf.freq
Now, I want to now find the common and "all" haplotype blocks and the frequency of haplotypes in this region.
What filters should be applied on allele frequencies . Any help is very much appreciated.