I am using Pindel on human Whole Exome Sequence bam files, I have been looking at Short Insertions(_SI) and Deletions(_D) only, and have found that when I run Pindel on one sample only, the number of variants called in each exome is much lower than when I run Pindel on multiple exomes.
For example, I ran Pindel on one sample and found 415 combined Deletions and Short Insertions. I also ran Pindel on ~12 samples together and found 756 combined Deletions and Short Insertions in this sample (I counted only calls in that specific sample). The rest of my samples also have numbers consistent with this example.
I have not had a chance to compare the calls themselves as of yet, but plan to. But, this huge difference concerns me and my questions are:
- Is this typical? Has anyone else experienced this type of behavior when using Pindel on a single sample -vs- multiple samples?
- Is it safe to assume that more samples = more statistical power = more accurate calls?
- Is there something obvious I am missing?