I am looking at a region that has a copy number increase in my samples I have sequenced compared to the reference. Based on read depth and other analysis I believe there is about 14 copies of this region. I have a VCF file for this region but if a SNP was present in 1 of the 14 copies it would have been filtered out. Also it is only possible for individuals to be called as homozygous or heterozygous at this position. But what if they have 3 or 4 different bases at a given position due to the multiple copies. Is there a way from me to pull from bams and get the frequency of all SNPS in the region for my samples. I am basically trying to find if there is a possible dominant negative mutation in 1 of the 14 copies of the gene in the region. Any advice on how I might do so would be much appreciated.
You could repeat SNV calling in this region with more sensitive parameters (
samtools mpileup or
freebayes --pooled-discrete --ploidy 14) and filter the results with some further scripting if necessary; depending on the size of the region of interest you may want to use a variant annotation tool like VEP to help you isolate the deleterious SNVs.
Alternatively, if the CNV is small and you're only examining one sample, you could just open the BAM in IGV, change the viewer settings to show SNVs with minimum frequency of 3%, and scan visually.