I have analyzed some specific structural variants in a set of different samples (originated from different patients). Now, I would like to perform some correlation between the number of those variants detected in each patient and some clinical information about them. The problem is that there are big differences between the coverage of the bam from different patients. This means that, in those patients with lower coverages, I normally detect less variants. Therefore, I would like to correct the number of variants detected in each patient by its coverage.
Does anybody know which approach can I use to get this?
Do you already use a kind of percentage threshold? So only use the variant if it occurs in 10% or 99% of the mapped reads?
No, we use an absolute number of supporting reads to accept the variant. For example: if there are more than 3 supporting reads then we accept the variant.
Then it can be a solution the use a percentage. Maybe some others will post a different solution.