Question: Mapping To Genome With Ambiguous Reference Characters (R,Y,K,M,S,W Etc.)
5
gravatar for Rm
9.2 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:

I am mapping illumina reads using bowtie/bwa to a reference genome with ambiguous reference characters (N, -, R, Y,K,M,S,W etc.).

For example: At a particular location where ambiguous reference character exists (R), I want read with either A or G to be matched as perfect match.

In the below case using bowtie, read is not able to match to the reference.

Reference: ATTCAAGCCCMGAGCGTMTATAAKGGAAGCTKCGCGTGTGTATGCATCAATTGGCAAGATGTTGTG Read:
ATTCAAGCCCAGAGCGTCTATAATGGAAGCTTCGCGTGTGTATGCATCAATTGGCAAGATGTTGTG

Can you suggest options to set within bowtie/bwa or suggest other alignment tool where I can acheive this.

I learnt that "Alignments involving one or more ambiguous reference characters (N, -, R, Y, etc.) are considered invalid by Bowtie."

reference mapping bowtie bwa • 6.4k views
ADD COMMENTlink written 9.2 years ago by Rm7.9k

Strictly speaking, bowtie, as well as bwa, takes an ambiguous base as a random A/C/G/T. They regard a match to an ambiguous base as a mismatch after mapping.

ADD REPLYlink written 9.2 years ago by lh331k
13
gravatar for lh3
9.2 years ago by
lh331k
United States
lh331k wrote:

GSNAP, Mosaik and novoalign.

ADD COMMENTlink written 9.2 years ago by lh331k
1

If your intention is to reduce the reference bias, I am sure novoalign and gsnap implement that correctly. As mosaik is not published (novoalign is not published but I have discussed this with its developer), I do not know if it does that correctly. Note that claiming a feature does not necessarily mean the feature is implemented correctly.

ADD REPLYlink written 9.2 years ago by lh331k

Thanks @lh3, I accept your answer; but I am trying SOAPaligner/soap2

ADD REPLYlink written 9.2 years ago by Rm7.9k

I do not know if soap2 accepts ambiguous bases. In general, soap2 is great, but it does not natively support SAM output, which might cause problems for downstream analyses. If you have to use one, I would recommend novoalign.

ADD REPLYlink written 9.2 years ago by lh331k

By design, it is hard for a BWT based aligner to work with ambiguous bases in the expected way.

ADD REPLYlink written 9.2 years ago by lh331k

I am not happy with SOAPaliner...currently trying GSNAP....

ADD REPLYlink written 9.2 years ago by Rm7.9k

@Lh3; finally after looking to all three you suggested, going ahead with Mosaik. Thanks!

ADD REPLYlink written 9.2 years ago by Rm7.9k
1
gravatar for Andreas
9.2 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

RazerS accepts ambiguity characters as well: http://www.seqan.de/projects/razers.html

Andreas

Edit: Does not seem to be fully correct. See comments. Only N is supported.

ADD COMMENTlink modified 9.1 years ago • written 9.2 years ago by Andreas2.4k

thanks @Andreas, BTW do you know which options to use in razer to make use of ambiguous bases in reference sequence

ADD REPLYlink written 9.2 years ago by Rm7.9k

In RazerS, I did not see any parameter to support ambiguous bases except "-mN" which only allows "N" to match to any base (ATGC).

ADD REPLYlink written 9.2 years ago by Rm7.9k

I enquired with the Razers authors they say it doesnot support the Ambiguous bases except "N"

ADD REPLYlink written 9.2 years ago by Rm7.9k

Ok. Thanks for your investigation and clarification. Edited my answer.

ADD REPLYlink written 9.1 years ago by Andreas2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1693 users visited in the last hour