Disparity in number of read strandedness for Long Insert identification

0

Entering edit mode

9.6 years ago

Robert Bruccoleri • 0

I'm testing Pindel's detection of Long Inserts using a modified bacterial genome sequence as a reference and real Illumina paired end reads for said bacteria. To test the long insert capability of Pindel, I'm deleting 3000 bases from one of the contigs in the genomic reference. Because the reads now contain sequence that's missing from the modified reference, it should appear to Pindel as an insertion.

The current version of Pindel (downloaded from Github on August 26, 2014 on the master branch) is able to find the long insertion at nearly the correct position, but the number of + reads is much larger than the number of - reads (ratio is 449 to 4). That doesn't seem right.

In contrast, a simulated inversion has a ratio of + 967 to - 978.

Anyone have any idea what's going on?

Pindel • 1.8k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Robert Bruccoleri • 0

Login before adding your answer.