I have some 50 samples displaying a diseased phenotype all carrying the same mutation (known mutation causing the disease). I want to check whether the mutation is founder in nature. I have performed short read sequencing in a 2MB region flanking (upstream and downstream) the said mutation. How do I proceed from here? I have VCFs with me and going through the SNPs I can see a SNP pattern surrounding the mutated allele which is common to all the samples. How do I go about this. I have some samples with family members rest all are unrelated.
Thank you GokalpC for having a look at my query. I do have the bam files. I am actually a complete novice at haplotyping and hence I have three queries-
1) After performing ReadBackedPhasing using whatshap would I still need to phase them further using the other tools mentioned by you?
2) Once I get the phased haplotypes just by looking at the haplotypes I can check the linkage with the disease variant and also the length of the haplotype? Is that what you mean or do I require any other tool?
3) Most of my diseased samples are originating from a particular region. I am trying to understand whether the mutation is founder in nature. Will the methods outlined by you be enough to prove that? If not then how should I go about it?
Each phasing algorithm have their weaknesses especially when working with short reads. whatshap is generally good for generating haplotypes of kilobases if enough variants are present but for those regions split with less variants you will have gaps between haplotypes so phase by transmission may ensure that at least which allele comes from which parent and adds further phasing of variants over longer distances. Finally IBD can provide phasing based on co-segregation patterns using HMM models to make the final verdict.
Once all good haplotypes are generated it should be fairly easy to follow the segregation pattern of long haplotypes which may suggest an ancestral allele being present for your disease of interest. If it is a recessive disease it may become even easier to follow with only IBD and Runs of Homozygosity patterns.
My suggestions may look a bit complicated at first look but it may become easier if you try and find the best solution for you by trying these steps one by one. For example I would strongly perform checking runs of homozygosity for any recessive trait in my cohort first.
Thank you for patiently answering my query. One of the mutations (that I am suspecting to be founder in nature) is recessive in nature. Basically I have visually analysed the sequences of multiple mutation positive samples (using IGV) both upstream and downstream of the mutation and can see conserved SNPs which are forming a haplotype?. Also haplotypes? of varying lengths are seen (atleast 2 to 4). These patterns are absent in the control samples. I will still need to confirm it with IBD and Runs of Homozygosity or the SNP data is enough?
I have certain samples for which trios are available and mostly one of the parent is homozygous or heterozygous and the other is again homozygous or heterozygous for the mutation. In heterozygous samples I can make out the same pattern based on segregation visually. Not sure if this making sense.
Thanks again.
Thank you GokalpC for having a look at my query. I do have the bam files. I am actually a complete novice at haplotyping and hence I have three queries-
1) After performing ReadBackedPhasing using whatshap would I still need to phase them further using the other tools mentioned by you?
2) Once I get the phased haplotypes just by looking at the haplotypes I can check the linkage with the disease variant and also the length of the haplotype? Is that what you mean or do I require any other tool?
3) Most of my diseased samples are originating from a particular region. I am trying to understand whether the mutation is founder in nature. Will the methods outlined by you be enough to prove that? If not then how should I go about it?
Thanks and regards