Question: finding each pattern in a set of sequences
0
13 months ago by
HK30
HK30 wrote:

Hey All,

I am trying to find a pattern in a set of sequences. What i am trying to do is that first take the 1st sequence of the pattern and match it with all subjects one by one and give an output (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) and so on i.e. in an iterative way. The input (pattern and subjects) are DNAstringSet instance (Biostrings).

I have used the function

``````mat=nucleotideSubstitutionMatrix(match=2,mismatch = -3,baseOnly = TRUE)
localAlign=pairwiseAlignment(pattern,subject,type="local",
substitutionMatrix=mat,
gapOpening=-5, gapExtension=-2)
``````

But this way it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4

Example:

input:pattern

``````A DNAStringSet instance of length 734
width seq                                                                         names
[1]  1000 GGTAAGAGTTTCTTAACAGATCTCAACATTTGCTATATAC...AGATTATTTGTCCTTTGAGATAAAATTACCAC P1
[2]  1000 TGTAAGTAATACTTAATGGTAATTTTTGTTTTCTCTTTCA...AGAAGCAAGGAGACCCGTTAGAGGAAGCATCC P2
[3]  1000 GGTGAGTGTATGATTGATAACTAATCTCTTAGATTAACCA...CATGATATGAAATGGTTCCTAAAGATCCAGAC P3
[4]  1000 GGTGAGCAAAATCAAGCAATGCATTGTTTGTTTTGGAGGG...CTATTTATGTACTACCTTTTTTTTTTAGAAAA P4
``````

input: subject

``````A DNAStringSet instance of length 1000
width seq                                                                        names
[1]  1000 GTAGGTACCTGGGAATTCACAAATTAAGACTTTTGAATA...TTCTTATTCAACCGTAGTAACATTAGATGAATA S1
[2]  1000 GTGAGCGCTGCTGCCCAAGCCGCCTGGCTATGCTCGATT...AGATGGCCTTTTCTCTCAGCCCACTGTGACCTA S2
[3]  1000 GTAAGTACAGGCTGAAAGTTACATGCTCTCCAAGGGTGA...ACATAGTAATGAATAGACTTTCAGACACAGCAT S3
[4]  1000 GTAAGTTGCTTGTTTCTTAAATGTTAGGATCTATTACTT...AACAATATAGGTAAGTCTAGCCCTCAAGGCGCT S4
``````
programming subject pattern loop R • 349 views
modified 13 months ago • written 13 months ago by HK30

I don't know if it's possible without a loop.

From the documentation : https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/pairwiseAlignment

subject : a character vector of length 1, an XString, or an XStringSet object of length 1.

Yours is 1000 :

A DNAStringSet instance of length 1000

@Bastien Herve Thanks for the reply, i have already used this function but as mentioned it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4. whereas i need something different i.e (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) ans so on.

So i need help with the iteration (and i think this can only be done with the loop).

I think you can do the inverse process by looping over your subject :

``````localAlign=pairwiseAlignment(pattern,one_subject,type="local", substitutionMatrix=mat, gapOpening=-5,
``````

You will get at first step : p1 vs s1, p2 vs s1...

You will get at second step : p1 vs s2, p2 vs s2...

For each step save your localAlign

At the end process what you saved

It's hard to reproduce without data so i didn't try it. At least it's working in my head :)

Well i knw i have to use the loop on the localAlignment function, but as i an newbie i R so have some problems in using loop or any alterantive way. The logic in mind is using nested loops

``````for(i in 1:length(pattern){
for(j in 1:length(subjects){
loaclAlignmentfunction }} # i.e first it take the p1 and come in the second loop and iterate in the subject loop till end, then goes to outer loop and picks p2 and comes in the seconf loop and iterates till the last p4 and so on.
``````
``````localAlign_all <- c()
for (i in 1:length(subjects)) {
localAlign=pairwiseAlignment(pattern,subjects[i],type="local", substitutionMatrix=mat, gapOpening=-5,
localAlign_all[i] <- localAlign
}
``````

According to the description, you can use multiple patterns but only one subject