Question: force reads to map to a particular genome region
0
gravatar for Jorge Amigo
3.7 years ago by
Jorge Amigo10k
Santiago de Compostela, Spain
Jorge Amigo10k wrote:

we are currently extracting human dna regions in the lab from multiple samples, and we would like to sequence them all with NGS. since the dna extraction is a very direct process we know that the resulting reads should only map to particular genome regions that we know in advance, so we need to know which method would be the best one to achieve this forced mapping, as we would like to handle all the results as we normally do (visualizing the reads through IGV, annotating variants through ANNOVAR,...).

after some internal discussion we've come along with 3 possibilities:

  1. create a custom reference sequence and map all reads to it (we would afterwards have to translate the custom positions to genomic ones in order to get annotations for instance, editing bam and variant files; something that we'd prefer to avoid)
  2. use our usual hg19 reference but make the aligner to consider our regions of interest only (either using a masked hg19 reference or maybe making the aligner to accept a bed file and to focus on it)
  3. align the reads as we normally would, then extract the reads from our regions of interest only, and then edit the mapping quality 0 values manually (which value could we use? would we have to assign a random value like 10, or would it be possible to get the original mapping quality before the read was found somewhere else in the genome, hence lowering the mapping quality to 0?)

so we would like to know if anyone has already come across this situation and which aprox did they take, or if any of the 3 possibilities we've ended up with sounds any better than the others. thanks in advance.

alignment • 1.6k views
ADD COMMENTlink modified 3.7 years ago by Devon Ryan80k • written 3.7 years ago by Jorge Amigo10k
3
gravatar for Pierre Lindenbaum
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

I would keep hg19 and mask the regions of non-interest with bedtools maskfasta ( http://bedtools.readthedocs.org/en/latest/content/tools/maskfasta.html )

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Pierre Lindenbaum108k

yes, it makes sense. we were just unsure how the mapper would deal with a masked reference, but I guess we'll just have to try it out. thanks.

ADD REPLYlink written 3.7 years ago by Jorge Amigo10k
1

Most aligners that you're likely to use won't have an issue with this.

ADD REPLYlink written 3.7 years ago by Devon Ryan80k
2
gravatar for Devon Ryan
3.7 years ago by
Devon Ryan80k
Freiburg, Germany
Devon Ryan80k wrote:

The easiest method is #2, where you would just hard mask everything except for your regions of interest (and maybe some bases on each side). You could use bedtools maskfasta to do this, though you'll need to use bedtools complement on your BED file first.

How well you can recalculate MAPQ scores for version 3 will be aligner-dependent. You can do this with bowtie2, but not most others.

ADD COMMENTlink written 3.7 years ago by Devon Ryan80k

first ! :-)

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum108k
1

Only because I typed more! :P

ADD REPLYlink written 3.7 years ago by Devon Ryan80k
1

I'll have to accept them both ;-)

ADD REPLYlink written 3.7 years ago by Jorge Amigo10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1997 users visited in the last hour