Question: finding each pattern in a set of sequences
0
gravatar for HK
5 days ago by
HK20
HK20 wrote:

Hey All,

I am trying to find a pattern in a set of sequences. What i am trying to do is that first take the 1st sequence of the pattern and match it with all subjects one by one and give an output (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) and so on i.e. in an iterative way. The input (pattern and subjects) are DNAstringSet instance (Biostrings).

I have used the function

mat=nucleotideSubstitutionMatrix(match=2,mismatch = -3,baseOnly = TRUE)
localAlign=pairwiseAlignment(pattern,subject,type="local",
                             substitutionMatrix=mat,
                             gapOpening=-5, gapExtension=-2)

But this way it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4

Example:

input:pattern

A DNAStringSet instance of length 734
      width seq                                                                         names               
  [1]  1000 GGTAAGAGTTTCTTAACAGATCTCAACATTTGCTATATAC...AGATTATTTGTCCTTTGAGATAAAATTACCAC P1 
  [2]  1000 TGTAAGTAATACTTAATGGTAATTTTTGTTTTCTCTTTCA...AGAAGCAAGGAGACCCGTTAGAGGAAGCATCC P2
  [3]  1000 GGTGAGTGTATGATTGATAACTAATCTCTTAGATTAACCA...CATGATATGAAATGGTTCCTAAAGATCCAGAC P3
  [4]  1000 GGTGAGCAAAATCAAGCAATGCATTGTTTGTTTTGGAGGG...CTATTTATGTACTACCTTTTTTTTTTAGAAAA P4

input: subject

A DNAStringSet instance of length 1000
       width seq                                                                        names               
   [1]  1000 GTAGGTACCTGGGAATTCACAAATTAAGACTTTTGAATA...TTCTTATTCAACCGTAGTAACATTAGATGAATA S1
   [2]  1000 GTGAGCGCTGCTGCCCAAGCCGCCTGGCTATGCTCGATT...AGATGGCCTTTTCTCTCAGCCCACTGTGACCTA S2
   [3]  1000 GTAAGTACAGGCTGAAAGTTACATGCTCTCCAAGGGTGA...ACATAGTAATGAATAGACTTTCAGACACAGCAT S3
   [4]  1000 GTAAGTTGCTTGTTTCTTAAATGTTAGGATCTATTACTT...AACAATATAGGTAAGTCTAGCCCTCAAGGCGCT S4
ADD COMMENTlink modified 5 days ago • written 5 days ago by HK20

I don't know if it's possible without a loop.

From the documentation : https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/pairwiseAlignment

subject : a character vector of length 1, an XString, or an XStringSet object of length 1.

Yours is 1000 :

A DNAStringSet instance of length 1000

ADD REPLYlink written 5 days ago by Bastien Hervé1.0k

@Bastien Herve Thanks for the reply, i have already used this function but as mentioned it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4. whereas i need something different i.e (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) ans so on.

So i need help with the iteration (and i think this can only be done with the loop).

ADD REPLYlink written 5 days ago by HK20

I think you can do the inverse process by looping over your subject :

localAlign=pairwiseAlignment(pattern,one_subject,type="local", substitutionMatrix=mat, gapOpening=-5,

You will get at first step : p1 vs s1, p2 vs s1...

You will get at second step : p1 vs s2, p2 vs s2...

For each step save your localAlign

At the end process what you saved

It's hard to reproduce without data so i didn't try it. At least it's working in my head :)

ADD REPLYlink written 5 days ago by Bastien Hervé1.0k

Well i knw i have to use the loop on the localAlignment function, but as i an newbie i R so have some problems in using loop or any alterantive way. The logic in mind is using nested loops

for(i in 1:length(pattern){
   for(j in 1:length(subjects){
        loaclAlignmentfunction }} # i.e first it take the p1 and come in the second loop and iterate in the subject loop till end, then goes to outer loop and picks p2 and comes in the seconf loop and iterates till the last p4 and so on.
ADD REPLYlink modified 5 days ago • written 5 days ago by HK20
localAlign_all <- c()
for (i in 1:length(subjects)) { 
    localAlign=pairwiseAlignment(pattern,subjects[i],type="local", substitutionMatrix=mat, gapOpening=-5,
    localAlign_all[i] <- localAlign
}

According to the description, you can use multiple patterns but only one subject

ADD REPLYlink written 5 days ago by Bastien Hervé1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1023 users visited in the last hour