Question

force reads to map to a particular genome region

0

Entering edit mode

9.6 years ago

Jorge Amigo 14k

We are currently extracting human dna regions in the lab from multiple samples, and we would like to sequence them all with NGS. since the dna extraction is a very direct process we know that the resulting reads should only map to particular genome regions that we know in advance, so we need to know which method would be the best one to achieve this forced mapping, as we would like to handle all the results as we normally do (visualizing the reads through IGV, annotating variants through ANNOVAR,...).

After some internal discussion we've come along with 3 possibilities:

Create a custom reference sequence and map all reads to it (we would afterwards have to translate the custom positions to genomic ones in order to get annotations for instance, editing bam and variant files; something that we'd prefer to avoid)
Use our usual hg19 reference but make the aligner to consider our regions of interest only (either using a masked hg19 reference or maybe making the aligner to accept a bed file and to focus on it)
Align the reads as we normally would, then extract the reads from our regions of interest only, and then edit the mapping quality 0 values manually (which value could we use? would we have to assign a random value like 10, or would it be possible to get the original mapping quality before the read was found somewhere else in the genome, hence lowering the mapping quality to 0?)

So we would like to know if anyone has already come across this situation and which approach did they take, or if any of the 3 possibilities we've ended up with sounds any better than the others. thanks in advance.

alignment • 4.0k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.6 years ago by Jorge Amigo 14k

Ram · Accepted Answer · 2014-10-09

4

Entering edit mode

9.6 years ago

Pierre Lindenbaum 161k

I would keep hg19 and mask the regions of non-interest with bedtools maskfasta

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

yes, it makes sense. we were just unsure how the mapper would deal with a masked reference, but I guess we'll just have to try it out. thanks.

ADD REPLY • link 9.6 years ago by Jorge Amigo 14k

1

Entering edit mode

Most aligners that you're likely to use won't have an issue with this.

ADD REPLY • link 9.6 years ago by Devon Ryan 104k

Ram · Accepted Answer · 2014-10-09

2

Entering edit mode

9.6 years ago

Devon Ryan 104k

The easiest method is #2, where you would just hard mask everything except for your regions of interest (and maybe some bases on each side). You could use bedtools maskfasta to do this, though you'll need to use bedtools complement on your BED file first.

How well you can recalculate MAPQ scores for version 3 will be aligner-dependent. You can do this with bowtie2, but not most others.

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

first ! :-)

ADD REPLY • link 9.6 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

Only because I typed more! :P

ADD REPLY • link 9.6 years ago by Devon Ryan 104k

1

Entering edit mode

I'll have to accept them both ;-)

ADD REPLY • link 9.6 years ago by Jorge Amigo 14k