Question: How To Perform Basic Multiple Sequence Alignments In R?
3
gravatar for Tal Galili
3.3 years ago by
Tal Galili120
Israel
Tal Galili120 wrote:

The task I'm trying to achieve is to align several sequences.

I don't have a basic pattern to match to. All that I know is that the "True" pattern should be of length "30" and that the sequences I have had missing values introduced to them at random points.

Here is an example of such sequences, were on the left we see what is the real location of the missing values, and on the right we see the sequence that we will be able to observe.

My goal is to reconstruct the left column using only the sequences I've got on the right column (based on the fact that many of the letters in each position are the same)

                     Real_sequence           The_sequence_we_see
1   CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG
2   CGCAATACTAGC-AGGTGACTTCC-CT-CG   CGCAATACTAGCAGGTGACTTCCCTCG
3   CGCAATGATCAC--GGTGGCTCCCGGTGCG  CGCAATGATCACGGTGGCTCCCGGTGCG
4   CGCAATACTAACCA-CTAACT--CGCTGCG   CGCAATACTAACCACTAACTCGCTGCG
5   CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG
6   CGCTATACTAACAA-GTG-CTTAGGC-CTG   CGCTATACTAACAAGTGCTTAGGCCTG
7   CCCA-C-CTAA-ACGGTGACTTACGCTCCG   CCCACCTAAACGGTGACTTACGCTCCG

Here is an example code to reproduce the above example:

ATCG <- c("A","T","C","G")
set.seed(40)
original.seq <- sample(ATCG, 30, T)
seqS <- matrix(original.seq,200,30, T)
change.letters <- function(x, number.of.changes = 15, letters.to.change.with = ATCG) 
{
    number.of.changes <- sample(seq_len(number.of.changes), 1)
    new.letters <- sample(letters.to.change.with , number.of.changes, T)
    where.to.change.the.letters <- sample(seq_along(x) , number.of.changes, F)
    x[where.to.change.the.letters] <- new.letters
    return(x)
}
change.letters(original.seq)
insert.missing.values <- function(x) change.letters(x, 3, "-") 
insert.missing.values(original.seq)

seqS2 <- t(apply(seqS, 1, change.letters))
seqS3 <- t(apply(seqS2, 1, insert.missing.values))

seqS4 <- apply(seqS3,1, function(x) {paste(x, collapse = "")})
require(stringr)
# library(help=stringr)
all.seqS <- str_replace(seqS4,"-" , "")

# how do we allign this?
data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS)

I understand that if all I had was a string and a pattern I would be able to use

library(Biostrings)
pairwiseAlignment(...)

But in the case I present we are dealing with many sequences to align to one another (instead of aligning them to one pattern).

Is there a known method for doing this in R?

Thanks,

Tal

ADD COMMENTlink modified 3.1 years ago by Gregr20 • written 3.3 years ago by Tal Galili120
3

I would not attempt to do this in R unless there is already a library for doing so. There are plenty of good stand-alone multiple sequence alignment programs out there - I would just use one of them.

ADD REPLYlink written 3.3 years ago by Lars Juhl Jensen8.5k

Thank you Lars - Since my problem is more of a subset - I might choose to implement it. I appreciate your advice.

ADD REPLYlink written 3.3 years ago by Tal Galili120

tal i have to say I like your variable names - very easy to read your code

ADD REPLYlink written 3.3 years ago by Jeremy Leipzig12k

Thank you Jeremy - it's very kind of you to say :)

ADD REPLYlink written 3.3 years ago by Tal Galili120
2
gravatar for Michael Dondrup
3.3 years ago by
Bergen
Michael Dondrup27k wrote:

Afaik, there exists no such R package to do it directly from R. In particular the BioStrings package does not contain methods for multiple sequence alignments only for pairwise alignements.

You can read multiple alignments files in different formats using the seqinr package using the function read.alignment.

ADD COMMENTlink written 3.3 years ago by Michael Dondrup27k
2
gravatar for Gregr
3.2 years ago by
Gregr20
Gregr20 wrote:

Try the Bio3d R package and install the MUSCLE program separately:

http://mccammon.ucsd.edu/~bgrant/bio3d/html/seqaln.html

ADD COMMENTlink written 3.2 years ago by Gregr20
1
gravatar for Bilouweb
3.3 years ago by
Bilouweb960
Saclay, France
Bilouweb960 wrote:

I don't know if there is such function in R but multiple alignment is a hard computation task.

That's why all algorithms I know use a heuristic to find a near optimal multiple alignment of sequences.

ADD COMMENTlink written 3.3 years ago by Bilouweb960
1
gravatar for Thaman
3.3 years ago by
Thaman2.9k
Finland
Thaman2.9k wrote:

It seems like you are using BioStrings package and PairwiseAlignment function which produces the set of objects. Maybe it's better to go through R package Biostrings->MatchAlign again if you want to compute alignment without other applications.

ADD COMMENTlink written 3.3 years ago by Thaman2.9k

Thanks Thaman, I imagine I'll use that function (actually pairwiseAlignment).

ADD REPLYlink written 3.3 years ago by Tal Galili120

You are aware that pairwise alignments of many sequences and multiple sequence alignments are two different things?

ADD REPLYlink written 3.3 years ago by Michael Dondrup27k

Hi Michael. Yes, I am aware of it. It obviously won't work out of the box, but with some tweaking I might get it to solve my particular problem (which is a bit simpler then complete combination search for full sequence alignment). BTW, I also found the following answer to be useful: http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434

ADD REPLYlink written 3.3 years ago by Tal Galili120
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 533 users visited in the last hour