Question: Disparity in number of read strandedness for Long Insert identification
I'm testing Pindel's detection of Long Inserts using a modified bacterial genome sequence as a reference and real Illumina paired end reads for said bacteria. To test the long insert capability of Pindel, I'm deleting 3000 bases from one of the contigs in the genomic reference. Because the reads now contain sequence that's missing from the modified reference, it should appear to Pindel as an insertion.

The current version of Pindel (downloaded from Github on August 26, 2014 on the master branch) is able to find the long insertion at nearly the correct position, but the number of + reads is much larger than the number of - reads (ratio is 449 to 4). That doesn't seem right.

In contrast, a simulated inversion has a ratio of + 967  to - 978.


Anyone have any idea what's going on?

