Question: Mapping To Genome With Ambiguous Reference Characters (R,Y,K,M,S,W Etc.)
5
gravatar for Rm
10.1 years ago by
Rm8.0k
Danville, PA
Rm8.0k wrote:

I am mapping illumina reads using bowtie/bwa to a reference genome with ambiguous reference characters (N, -, R, Y,K,M,S,W etc.).

For example: At a particular location where ambiguous reference character exists (R), I want read with either A or G to be matched as perfect match.

In the below case using bowtie, read is not able to match to the reference.

Reference: ATTCAAGCCCMGAGCGTMTATAAKGGAAGCTKCGCGTGTGTATGCATCAATTGGCAAGATGTTGTG Read:
ATTCAAGCCCAGAGCGTCTATAATGGAAGCTTCGCGTGTGTATGCATCAATTGGCAAGATGTTGTG

Can you suggest options to set within bowtie/bwa or suggest other alignment tool where I can acheive this.

I learnt that "Alignments involving one or more ambiguous reference characters (N, -, R, Y, etc.) are considered invalid by Bowtie."

reference mapping bowtie bwa • 7.0k views
ADD COMMENTlink written 10.1 years ago by Rm8.0k

Strictly speaking, bowtie, as well as bwa, takes an ambiguous base as a random A/C/G/T. They regard a match to an ambiguous base as a mismatch after mapping.

ADD REPLYlink written 10.1 years ago by lh332k
13
gravatar for lh3
10.1 years ago by
lh332k
United States
lh332k wrote:

GSNAP, Mosaik and novoalign.

ADD COMMENTlink written 10.1 years ago by lh332k
1

If your intention is to reduce the reference bias, I am sure novoalign and gsnap implement that correctly. As mosaik is not published (novoalign is not published but I have discussed this with its developer), I do not know if it does that correctly. Note that claiming a feature does not necessarily mean the feature is implemented correctly.

ADD REPLYlink written 10.1 years ago by lh332k

Thanks @lh3, I accept your answer; but I am trying SOAPaligner/soap2

ADD REPLYlink written 10.1 years ago by Rm8.0k

I do not know if soap2 accepts ambiguous bases. In general, soap2 is great, but it does not natively support SAM output, which might cause problems for downstream analyses. If you have to use one, I would recommend novoalign.

ADD REPLYlink written 10.1 years ago by lh332k

By design, it is hard for a BWT based aligner to work with ambiguous bases in the expected way.

ADD REPLYlink written 10.1 years ago by lh332k

I am not happy with SOAPaliner...currently trying GSNAP....

ADD REPLYlink written 10.1 years ago by Rm8.0k

@Lh3; finally after looking to all three you suggested, going ahead with Mosaik. Thanks!

ADD REPLYlink written 10.1 years ago by Rm8.0k
1
gravatar for Andreas
10.1 years ago by
Andreas2.5k
Singapore
Andreas2.5k wrote:

RazerS accepts ambiguity characters as well: http://www.seqan.de/projects/razers.html

Andreas

Edit: Does not seem to be fully correct. See comments. Only N is supported.

ADD COMMENTlink modified 10.0 years ago • written 10.1 years ago by Andreas2.5k

thanks @Andreas, BTW do you know which options to use in razer to make use of ambiguous bases in reference sequence

ADD REPLYlink written 10.1 years ago by Rm8.0k

In RazerS, I did not see any parameter to support ambiguous bases except "-mN" which only allows "N" to match to any base (ATGC).

ADD REPLYlink written 10.1 years ago by Rm8.0k

I enquired with the Razers authors they say it doesnot support the Ambiguous bases except "N"

ADD REPLYlink written 10.1 years ago by Rm8.0k

Ok. Thanks for your investigation and clarification. Edited my answer.

ADD REPLYlink written 10.0 years ago by Andreas2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour