I am analysing Illumina whole-genome resequencing data from two clonally propagated plants aiming to find any potential variants that are unique to either of the two. Note that these would be expected to be somatic mutations and that due to the nature of propagation (by cuttings), I cannot predict the frequency of such variants in a sample. What I want is to detect any strongly-supported unique variants and I am envisaging the following approach:
- Map reads to reference genome
- Consider regions with expected read depth in both samples
- Detect all variants in those regions that occur in at least 2 reads
- Find variants that are unique to either sample
- Validate variants through visual inspection of read mappings at the corresponding positions
I can perform all steps, except 3. This is because variant callers generally call genotypes assuming diploidy and, therefore, do not report variants that do not conform to expected genotyping ratios (0/0 0/1 1/1). Based on step 5, I find that most (if not all) detected "unique" variants are the result of differences in variant frequencies rather than actual presence/absence.
Is there a way to call any variant that is supported by at least x read mappings? For example, by parsing samtools mpileup output or by applying a specific variant caller?
Any advice or suggestion would be most welcome.