I am thinking about how I can extract shared overlap interval from WGS data with arbitrary percentage.
According to the bedtools document, overlapping intervals can be extracted. https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html This is very useful and working well for me if I have a few samples.
However, I am analyzing several hundreds of samples, ended in no overlapped interval detected. This is understandable, let's say if 99 samples have T/A variant on the Chr1 position 1 but 1 sample does not have it, it results in no shared overlap interval. To overcome this situation, I would liked to extract variants that are overlapped in more than 99% among samples, 95%, 90% or even less, until I can find the overlapping intervals.
Does anyone know how to do it or could you please let me know the helpful websites? Or maybe GATK SelectVariants is doable?