Question: finding each pattern in a set of sequences

0

HK •

**20**wrote:Hey All,

I am trying to find a pattern in a set of sequences. What i am trying to do is that first take the 1st sequence of the pattern and match it with all subjects one by one and give an output (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) and so on i.e. in an iterative way. The input (pattern and subjects) are DNAstringSet instance (Biostrings).

I have used the function

```
mat=nucleotideSubstitutionMatrix(match=2,mismatch = -3,baseOnly = TRUE)
localAlign=pairwiseAlignment(pattern,subject,type="local",
substitutionMatrix=mat,
gapOpening=-5, gapExtension=-2)
```

But this way it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4

**Example:**

**input:pattern**

```
A DNAStringSet instance of length 734
width seq names
[1] 1000 GGTAAGAGTTTCTTAACAGATCTCAACATTTGCTATATAC...AGATTATTTGTCCTTTGAGATAAAATTACCAC P1
[2] 1000 TGTAAGTAATACTTAATGGTAATTTTTGTTTTCTCTTTCA...AGAAGCAAGGAGACCCGTTAGAGGAAGCATCC P2
[3] 1000 GGTGAGTGTATGATTGATAACTAATCTCTTAGATTAACCA...CATGATATGAAATGGTTCCTAAAGATCCAGAC P3
[4] 1000 GGTGAGCAAAATCAAGCAATGCATTGTTTGTTTTGGAGGG...CTATTTATGTACTACCTTTTTTTTTTAGAAAA P4
```

**input: subject**

```
A DNAStringSet instance of length 1000
width seq names
[1] 1000 GTAGGTACCTGGGAATTCACAAATTAAGACTTTTGAATA...TTCTTATTCAACCGTAGTAACATTAGATGAATA S1
[2] 1000 GTGAGCGCTGCTGCCCAAGCCGCCTGGCTATGCTCGATT...AGATGGCCTTTTCTCTCAGCCCACTGTGACCTA S2
[3] 1000 GTAAGTACAGGCTGAAAGTTACATGCTCTCCAAGGGTGA...ACATAGTAATGAATAGACTTTCAGACACAGCAT S3
[4] 1000 GTAAGTTGCTTGTTTCTTAAATGTTAGGATCTATTACTT...AACAATATAGGTAAGTCTAGCCCTCAAGGCGCT S4
```

I don't know if it's possible without a loop.

From the documentation : https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/pairwiseAlignment

Yours is 1000 :

2.8k@Bastien Herve Thanks for the reply, i have already used this function but as mentioned it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4. whereas i need something different i.e (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) ans so on.

So i need help with the iteration (and i think this can only be done with the loop).

20I think you can do the inverse process by looping over your subject :

You will get at first step : p1 vs s1, p2 vs s1...

You will get at second step : p1 vs s2, p2 vs s2...

For each step save your localAlign

At the end process what you saved

It's hard to reproduce without data so i didn't try it. At least it's working in my head :)

2.8kWell i knw i have to use the loop on the localAlignment function, but as i an newbie i R so have some problems in using loop or any alterantive way. The logic in mind is using nested loops

20According to the description, you can use multiple patterns but only one subject

2.8k