Question: finding each pattern in a set of sequences
0
gravatar for HK
19 months ago by
HK40
HK40 wrote:

Hey All,

I am trying to find a pattern in a set of sequences. What i am trying to do is that first take the 1st sequence of the pattern and match it with all subjects one by one and give an output (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) and so on i.e. in an iterative way. The input (pattern and subjects) are DNAstringSet instance (Biostrings).

I have used the function

mat=nucleotideSubstitutionMatrix(match=2,mismatch = -3,baseOnly = TRUE)
localAlign=pairwiseAlignment(pattern,subject,type="local",
                             substitutionMatrix=mat,
                             gapOpening=-5, gapExtension=-2)

But this way it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4

Example:

input:pattern

A DNAStringSet instance of length 734
      width seq                                                                         names               
  [1]  1000 GGTAAGAGTTTCTTAACAGATCTCAACATTTGCTATATAC...AGATTATTTGTCCTTTGAGATAAAATTACCAC P1 
  [2]  1000 TGTAAGTAATACTTAATGGTAATTTTTGTTTTCTCTTTCA...AGAAGCAAGGAGACCCGTTAGAGGAAGCATCC P2
  [3]  1000 GGTGAGTGTATGATTGATAACTAATCTCTTAGATTAACCA...CATGATATGAAATGGTTCCTAAAGATCCAGAC P3
  [4]  1000 GGTGAGCAAAATCAAGCAATGCATTGTTTGTTTTGGAGGG...CTATTTATGTACTACCTTTTTTTTTTAGAAAA P4

input: subject

A DNAStringSet instance of length 1000
       width seq                                                                        names               
   [1]  1000 GTAGGTACCTGGGAATTCACAAATTAAGACTTTTGAATA...TTCTTATTCAACCGTAGTAACATTAGATGAATA S1
   [2]  1000 GTGAGCGCTGCTGCCCAAGCCGCCTGGCTATGCTCGATT...AGATGGCCTTTTCTCTCAGCCCACTGTGACCTA S2
   [3]  1000 GTAAGTACAGGCTGAAAGTTACATGCTCTCCAAGGGTGA...ACATAGTAATGAATAGACTTTCAGACACAGCAT S3
   [4]  1000 GTAAGTTGCTTGTTTCTTAAATGTTAGGATCTATTACTT...AACAATATAGGTAAGTCTAGCCCTCAAGGCGCT S4
programming subject pattern loop R • 467 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by HK40

I don't know if it's possible without a loop.

From the documentation : https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/pairwiseAlignment

subject : a character vector of length 1, an XString, or an XStringSet object of length 1.

Yours is 1000 :

A DNAStringSet instance of length 1000

ADD REPLYlink written 19 months ago by Bastien Hervé4.5k

@Bastien Herve Thanks for the reply, i have already used this function but as mentioned it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4. whereas i need something different i.e (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) ans so on.

So i need help with the iteration (and i think this can only be done with the loop).

ADD REPLYlink written 19 months ago by HK40

I think you can do the inverse process by looping over your subject :

localAlign=pairwiseAlignment(pattern,one_subject,type="local", substitutionMatrix=mat, gapOpening=-5,

You will get at first step : p1 vs s1, p2 vs s1...

You will get at second step : p1 vs s2, p2 vs s2...

For each step save your localAlign

At the end process what you saved

It's hard to reproduce without data so i didn't try it. At least it's working in my head :)

ADD REPLYlink written 19 months ago by Bastien Hervé4.5k

Well i knw i have to use the loop on the localAlignment function, but as i an newbie i R so have some problems in using loop or any alterantive way. The logic in mind is using nested loops

for(i in 1:length(pattern){
   for(j in 1:length(subjects){
        loaclAlignmentfunction }} # i.e first it take the p1 and come in the second loop and iterate in the subject loop till end, then goes to outer loop and picks p2 and comes in the seconf loop and iterates till the last p4 and so on.
ADD REPLYlink modified 19 months ago • written 19 months ago by HK40
localAlign_all <- c()
for (i in 1:length(subjects)) { 
    localAlign=pairwiseAlignment(pattern,subjects[i],type="local", substitutionMatrix=mat, gapOpening=-5,
    localAlign_all[i] <- localAlign
}

According to the description, you can use multiple patterns but only one subject

ADD REPLYlink written 19 months ago by Bastien Hervé4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1187 users visited in the last hour