Question

Accounting for off-target reads

0

Entering edit mode

7.3 years ago

L. A. Liggett ▴ 120

I am using a probe based sequencing method similar to illumina's trueseq or molecular inversion probe protocols, where I am essentially amplifying only specific 100bp regions of the human genome and sequencing these. So if everything was working ideally I would expect sequencing reads to fall only within my probed regions. However, I do get a substantial number of reads outside the target area, that robustly align to other specific regions of the genome and I don't understand why this would be the case.

I will see some 200 extra variants that fall outside my targeted region, and these will be covered by thousands of reads. And I'm wondering if there is something bioinformatically I should do differently other than just eliminating these reads (I am using bwa mem for alignment and freebayes for variant calling) or if there is a biological reason why a lot of amplicons will align elsewhere in the human genome.

alignment sequencing • 2.2k views

ADD COMMENT • link updated 7.3 years ago by harold.smith.tarheel ★ 4.9k • written 7.3 years ago by L. A. Liggett ▴ 120

score 1 · Accepted Answer · 2017-01-09

There are a number of potential explanations:

1) the probes are correct but PCR conditions are not sufficiently stringent, so mispriming produces undesired amplicons.

2) the probes are not correct and produce undesired amplicons.

3) the probes are correct and produce desired amplicons, but your reference/annotation for alignment is incorrect/doesn't match your expectation.

4) the undesired sequences are present in your sample (e.g., insertions, rearrangements).

What fraction of reads are off-target? And are any of your expected targets absent? Those answers might help to discriminate some of the possibilities.