I'm trying to hunt for "rare mutations" which may serve as potential causation for some rare disease based on exome data.
I just found two nonsynonymous mutations simply 3bp away from each other, which results in the alteration of two consecutive amino acids.
Actually UCSC define them as "common SNP" (UCSC defines "rare mutation" as those with MAF <= 1% while when I analyzed I loosen the cutoff to 5% ), I'm just wondering are these two mutations in a haplotype? ie. are they always transmitted at the same time, or more like independently? If transmitted always together, the MAF of "combination of two mutations" will simply be ~5%, which kicks them out from candidates because 5% sounds NOT rare enough for rare disease; if independently, the possibly smallest MAF will simply be 5%*5%=0.25%, which makes them perfect candidate for rare disease.
There are a few different things to keep in mind. Based on talks I have heard and papers I have read regarding MAF's and possibility of being a causal variant is that strict filtering on the magic 1.5% cutoff might not be a good idea. If it is something in dbSNP129 or earlier then yes, it is probably fairly common, but estimates from 1000Genomes may be overinflated. You may also be dealing with a different enough population that the MAF estimate isn't worth as much.
If your variants are 3bp's apart it should be easy enough to see if they are compound heterozygotes or not by looking at the raw data. If they are appearing on the same reads then they represent a haplotype (or a sequencing error). Otherwise they may be compound hets. Verifying is usually as simple as doing sanger sequencing of the relevant exon on your samples parent's. They should each be heterozygous for one, but not both, variants.
Check a little more in to the source of the SNPs in the database and how many samples they were sequenced in. Check EVS and their massive exome sequencing projects to see if the variants were seen there as well.
Have you done a haplotype analysis on them -- i.e. do you know the mutation frequencies for each and if the two mutations always occur together or which percentage has one mutation versus the other? I'm a little confused on your question otherwise? Have you actually done a haplotype analysis test already?