Entering edit mode

4.7 years ago

Ashley
▴
90

```
library("Biostrings")
```

I want to calculate the probability of di-nucleotide AA, TT, AT, and TA in each 2 location.

My DNA sequence is as follows:

```
DNA.set
A DNAStringSet instance of length 5
width seq names
[1] 20 TCCGTATTGGAAAGCTCGTC SEQ-1
[2] 20 TTAGACCACTCCGCATGTAG SEQ-2
[3] 20 CTGTGGTACGGCTCAAACGG SEQ-3
[4] 20 CTCCCGCCTATCTCCCTTCT SEQ-4
[5] 20 TCGCCTAGAAAAAGTTTCCT SEQ-5
```

I want to obtain the result as follows:

```
AA=0,0,0,0,1/5,2/5,0,1/5,0,0
TT=1/5,0,0,0,0,0,0,1/5,1/5,0
AT=0,0,0,0,0,0,0,0,1/5,0,0
TA=0,0,1/5,1/5,1/5,0,0,0,0,0
```

Any help would be great appreciate.

dinucleotideFrequency function of biostrings could give those 2mers. than you can take the subset of your desired ones.

Thanks for your reply. But I want to know the frequency or probability of A/T in each position. Not total number. So dinucleotideFrequency maybe isn't suitable for me.

consensusMatrix(dinucleotideFrequency(DNA.set)) ? maybe ?

For our example,

Thanks for your kind help. But I think the column of result should be the length(DNA.seq)/2=10, however, the column is 16. And it didn't show which column represents for AA, AT, TA and TT. I am the newcomer of bioinformatics, could you help me figure it out? Thank you so much. With my best wishes.

I think the number of column is always the 16. For another example,

(original link: https://www.dropbox.com/s/h8de0hcc8vc193t/data.jpg?dl=0 )