I have created BAM files for 10 samples . My goal is to identify SNPs that are shared among samples (or by a subset of my sample) , as well as those that are sample-specific . Ultimately, I'd like to identify the genotype (haplotype) of each sample, and reconstruct their relationship.
The main issue I am having is that coverage is highly uneven among samples. For example, a region X may be covered in only 4 samples, while a region Z may be covered only in 3).
I am looking at extracting all regions that are covered (1X and over, or from 2X to 50X) in all samples (or a subset of samples), and then use these to call SNPs and extract genotypes.
Any help on this matter will be greatly appreciated. Thank you very much!